Unpacking AGI: A Framework for Human-Level AI Assessment

TLDR: A new research paper proposes a quantifiable framework to define and evaluate Artificial General Intelligence (AGI) as matching the cognitive versatility and proficiency of a well-educated adult. Based on the Cattell-Horn-Carroll theory of human cognition, it assesses AI across ten core cognitive domains like reasoning, memory, and perception. Evaluations of current models like GPT-4 (27%) and GPT-5 (58%) reveal “jagged” profiles with significant deficits, particularly in long-term memory storage, highlighting the remaining gap to AGI despite rapid progress.

The quest for Artificial General Intelligence (AGI) has long been hampered by a lack of a clear, measurable definition. A recent research paper titled “A Definition of AGI” by a large collaborative group of authors, including Dan Hendrycks, Dawn Song, and Yoshua Bengio, aims to resolve this ambiguity by introducing a quantifiable framework for evaluating AGI.

The paper proposes a straightforward yet profound definition: AGI is an AI system that can match or exceed the cognitive versatility and proficiency of a well-educated adult. This definition moves beyond specialized AI performance, emphasizing the broad range and depth of skills characteristic of human intelligence.

To make this definition practical, the researchers turned to the most empirically validated model of human cognition: the Cattell-Horn-Carroll (CHC) theory. This theory breaks down general intelligence into distinct broad and narrow abilities, providing a comprehensive map of human cognitive functions. By adapting established human psychometric tests, the framework evaluates AI systems across these human-centric cognitive domains.

Also Read:

Ten Core Cognitive Domains

The framework identifies ten core cognitive domains, each contributing 10% to the overall AGI score, ensuring a balanced assessment of breadth:

General Knowledge (K): Factual understanding of the world, including commonsense, science, social science, history, and culture.
Reading and Writing Ability (RW): Proficiency in understanding and producing written language.
Mathematical Ability (M): Skills across arithmetic, algebra, geometry, probability, and calculus.
On-the-Spot Reasoning (R): Flexible problem-solving for novel situations, including deduction, induction, theory of mind, planning, and adaptation.
Working Memory (WM): The ability to hold and manipulate information in active attention across text, audio, and visual formats.
Long-Term Memory Storage (MS): The capacity to continually learn and store new information.
Long-Term Memory Retrieval (MR): The fluency and precision of accessing stored knowledge, crucially avoiding confabulation (hallucinations).
Visual Processing (V): The ability to perceive, analyze, reason about, and generate visual information.
Auditory Processing (A): The capacity to discriminate, recognize, and work creatively with auditory stimuli.
Speed (S): The ability to perform simple cognitive tasks quickly.

Applying this framework to contemporary AI models revealed a “jagged” cognitive profile. While models like GPT-4 and GPT-5 show impressive proficiency in knowledge-intensive areas, they exhibit significant weaknesses in foundational cognitive machinery. For instance, GPT-4 scored 27% and GPT-5 achieved 58% on this AGI scale, highlighting both rapid progress and the substantial distance still to cover.

A critical bottleneck identified is long-term memory storage, where current models score near 0%. This “amnesia” means AI systems struggle to retain information over long periods or across interactions, forcing them to “re-learn” context repeatedly. Deficits in visual reasoning also limit AI agents’ ability to interact with complex digital environments effectively.

The paper also discusses “capability contortions,” where AI systems use strengths in one area to mask weaknesses in another. For example, relying on massive context windows (working memory) to compensate for a lack of true long-term memory storage, or using external search tools (Retrieval-Augmented Generation, RAG) to mitigate imprecise internal memory retrieval and hallucinations. These workarounds, while effective in some scenarios, are not substitutes for genuine, integrated cognitive abilities.

The researchers emphasize that their definition focuses on human-level AI, measuring core cognitive abilities rather than specialized economic value or physical skills. This framework serves as a rigorous diagnostic tool, pinpointing specific strengths and profound weaknesses, and guiding the path toward achieving true Artificial General Intelligence. For more in-depth information, you can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unpacking AGI: A Framework for Human-Level AI Assessment

Ten Core Cognitive Domains

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Google Unveils Free 5-Day AI Agents Intensive Course on Kaggle

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates