spot_img
HomeResearch & DevelopmentUnpacking AGI: A Framework for Human-Level AI Assessment

Unpacking AGI: A Framework for Human-Level AI Assessment

TLDR: A new research paper proposes a quantifiable framework to define and evaluate Artificial General Intelligence (AGI) as matching the cognitive versatility and proficiency of a well-educated adult. Based on the Cattell-Horn-Carroll theory of human cognition, it assesses AI across ten core cognitive domains like reasoning, memory, and perception. Evaluations of current models like GPT-4 (27%) and GPT-5 (58%) reveal “jagged” profiles with significant deficits, particularly in long-term memory storage, highlighting the remaining gap to AGI despite rapid progress.

The quest for Artificial General Intelligence (AGI) has long been hampered by a lack of a clear, measurable definition. A recent research paper titled “A Definition of AGI” by a large collaborative group of authors, including Dan Hendrycks, Dawn Song, and Yoshua Bengio, aims to resolve this ambiguity by introducing a quantifiable framework for evaluating AGI.

The paper proposes a straightforward yet profound definition: AGI is an AI system that can match or exceed the cognitive versatility and proficiency of a well-educated adult. This definition moves beyond specialized AI performance, emphasizing the broad range and depth of skills characteristic of human intelligence.

To make this definition practical, the researchers turned to the most empirically validated model of human cognition: the Cattell-Horn-Carroll (CHC) theory. This theory breaks down general intelligence into distinct broad and narrow abilities, providing a comprehensive map of human cognitive functions. By adapting established human psychometric tests, the framework evaluates AI systems across these human-centric cognitive domains.

Also Read:

Ten Core Cognitive Domains

The framework identifies ten core cognitive domains, each contributing 10% to the overall AGI score, ensuring a balanced assessment of breadth:

  • General Knowledge (K): Factual understanding of the world, including commonsense, science, social science, history, and culture.
  • Reading and Writing Ability (RW): Proficiency in understanding and producing written language.
  • Mathematical Ability (M): Skills across arithmetic, algebra, geometry, probability, and calculus.
  • On-the-Spot Reasoning (R): Flexible problem-solving for novel situations, including deduction, induction, theory of mind, planning, and adaptation.
  • Working Memory (WM): The ability to hold and manipulate information in active attention across text, audio, and visual formats.
  • Long-Term Memory Storage (MS): The capacity to continually learn and store new information.
  • Long-Term Memory Retrieval (MR): The fluency and precision of accessing stored knowledge, crucially avoiding confabulation (hallucinations).
  • Visual Processing (V): The ability to perceive, analyze, reason about, and generate visual information.
  • Auditory Processing (A): The capacity to discriminate, recognize, and work creatively with auditory stimuli.
  • Speed (S): The ability to perform simple cognitive tasks quickly.

Applying this framework to contemporary AI models revealed a “jagged” cognitive profile. While models like GPT-4 and GPT-5 show impressive proficiency in knowledge-intensive areas, they exhibit significant weaknesses in foundational cognitive machinery. For instance, GPT-4 scored 27% and GPT-5 achieved 58% on this AGI scale, highlighting both rapid progress and the substantial distance still to cover.

A critical bottleneck identified is long-term memory storage, where current models score near 0%. This “amnesia” means AI systems struggle to retain information over long periods or across interactions, forcing them to “re-learn” context repeatedly. Deficits in visual reasoning also limit AI agents’ ability to interact with complex digital environments effectively.

The paper also discusses “capability contortions,” where AI systems use strengths in one area to mask weaknesses in another. For example, relying on massive context windows (working memory) to compensate for a lack of true long-term memory storage, or using external search tools (Retrieval-Augmented Generation, RAG) to mitigate imprecise internal memory retrieval and hallucinations. These workarounds, while effective in some scenarios, are not substitutes for genuine, integrated cognitive abilities.

The researchers emphasize that their definition focuses on human-level AI, measuring core cognitive abilities rather than specialized economic value or physical skills. This framework serves as a rigorous diagnostic tool, pinpointing specific strengths and profound weaknesses, and guiding the path toward achieving true Artificial General Intelligence. For more in-depth information, you can read the full paper here.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -