AI已经在多方面超越人类平均水平
- 图像识别 (Image recognition): 约在2016年超越人类平均水平。
- 语音识别 (Speech recognition): 约在2017年超越人类平均水平。
- 手写识别 (Handwriting recognition): 约在2017-2018年超越人类平均水平。
- 阅读理解 (Reading comprehension): 约在2018年超越人类平均水平。
- 语言理解 (Language understanding): 约在2020年超越人类平均水平。
- “预测性推理” (Predictive reasoning) 还是在接近人类平均水平中。

METR专门对人工智能处理长时间的任务进行了分析研究。

- 以人类表现作为基准进行对比
- 衡量AI自主完成任务的能力
- 跟踪多个AI模型的性能演进趋势
- 指数级扩展:AI的能力正以远超传统计算进步的速度发展,使其能够执行过去需要人类认知与努力才能完成的任务。
- 持续加速:自2019年以来,这种快速进步的模式持续被观察到,表明这是一个可靠的趋势,而非短暂的爆发。
- 性能里程碑:当今的AI模型已能自主完成相当于人类近一小时工作量的任务——这与几年前相比是惊人的飞跃。
我们发现,人类专家完成任务所需的时间可以很好地预测模型在特定任务上的成功率:当前模型在人类耗时少于 4 分钟的任务中成功率几乎达到 100%,但在耗时超过 4 小时的任务中成功率不到 10%。这使我们能够通过“模型能够以 x% 的概率成功完成的任务长度(以人类为单位)”来表征给定模型的能力。

基于以上分析,AI相关开发者或企业,应重点关注:
- 聚焦当前AI优势
- 设计可扩展架构(基模能力会越来越强)
- 分解复杂工作流(化繁为简)
- 准备多智能体协作
参考:


Measuring AI Ability to Complete Long Tasks
We propose measuring AI performance in terms of the *length* of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.


Measuring AI Ability to Complete Long Tasks
Despite rapid progress on AI benchmarks, the real-world meaning of benchmark performance remains unclear. To quantify the capabilities of AI systems in terms of human capabilities, we propose a...
AI Capabilities are Doubling Every 7 Months. Are You Keeping Up?
In the rapidly evolving landscape of artificial intelligence (AI), we are witnessing an unprecedented acceleration in AI's capabilities, fundamentally reshaping our understanding of technological progress. While Moore’s Law has historically guided expectations for computing advancements, recent rese