Beyond Utility: How AI Can Be Designed to Empower Humans

TLDR: This research paper proposes a new objective function for AI agents that aims to softly maximize long-term, aggregate human power, defined as the ability to achieve diverse goals. Instead of learning human preferences, the AI focuses on structural empowerment, considering human bounded rationality and social norms. Experiments show an AI using this metric learns cooperative behaviors like unlocking doors and clearing paths, suggesting a safer and more beneficial alternative to traditional reward-based AI objectives.

Artificial intelligence systems are rapidly advancing, bringing both immense potential and significant concerns, particularly regarding AI safety. A central concept in this discussion is ‘power’ – not just in terms of AI seeking control, but also human power, which is essential for our well-being. A new research paper explores a novel approach to AI design, aiming to promote both safety and human well-being by explicitly tasking AI agents with empowering humans and managing the power balance between humans and AI in a beneficial way.

The paper, titled “Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power,” by Jobst Heitzig and Ram Potham, introduces a principled framework for an AI’s objective function. Unlike traditional AI objectives that might focus on maximizing a specific utility or reward, this approach designs an objective that represents an inequality- and risk-averse long-term aggregate of human power. This means the AI is designed to consider how its actions affect the ability of many humans to achieve a wide variety of their potential goals, over a long period, while also being mindful of fairness and avoiding risky outcomes.

Understanding Human Power for AI

The core of this framework is a new metric for individual human power, termed “ICCEA power” (Informationally and Cognitively Constrained Effective Autonomous power). This metric measures how many diverse goals a human can effectively achieve, taking into account their own cognitive limitations, available information, and the behavior of other agents, including the AI. Crucially, the AI does not try to guess or learn a human’s specific, current goals, as these can be complex, changing, and hard to predict. Instead, it focuses on the structural ability to reach a wide range of possible states that could represent desirable outcomes.

The researchers detail how this individual power metric is aggregated across multiple humans and over time to form the AI’s overall objective. They incorporate several “desiderata” or desired properties into the metric’s design. For instance, the AI is incentivized to reduce uncertainty for humans, prefer reliable outcomes, and avoid concentrating power in the hands of a few. It also encourages the AI to be “corrigible,” meaning it can be corrected or stopped, and to avoid irreversible changes that might disempower humans in the future.

How the AI Learns to Empower

The paper proposes algorithms for an AI to compute and softly maximize this human power metric. In simpler environments, this can be done through a process called backward induction. For more complex, multi-agent environments, they suggest a two-phase learning approach similar to reinforcement learning. In the first phase, the AI learns to model human behavior, including their bounded rationality and social norms. In the second phase, based on this understanding, the AI learns its own policy to maximize the aggregate human power.

The implications of this objective were explored through analysis of various “paradigmatic situations” and a simulation in a small gridworld environment. The analysis suggests that an AI designed with this objective would:

Act as a transparent, instruction-following assistant, making clear commitments and respecting social norms.
Adapt to human limitations, offering a suitable number of options without overwhelming them.
Be hesitant to cause irreversible changes, often asking for confirmation before executing commands.
Manage resources fairly and sustainably.
Protect its own existence and functionality, as these are instrumental to empowering humans.

Also Read:

Proof of Concept: The Gridworld Experiment

In the gridworld simulation, a robot agent, without any explicit goal-specific rewards, learned to cooperatively empower a human. The human’s unknown goal was to reach a green square, but they were blocked by a locked door. The robot, solely driven by the objective to maximize human power, learned to navigate to a key, pick it up, unlock the door, and then move out of the human’s way. This complex sequence of actions emerged naturally as the robot discovered that these steps significantly increased the human’s ability to reach various possible goals, thus increasing its intrinsic reward.

This research offers a promising direction for developing highly capable general-purpose AI systems that are inherently safer and more beneficial. By focusing on the soft maximization of aggregate human power, such AI systems could provide a robust alternative to traditional utility-based objectives, potentially mitigating risks like power-seeking and misalignment. For more in-depth technical details, you can read the full paper available at arXiv:2508.00159.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Beyond Utility: How AI Can Be Designed to Empower Humans

Understanding Human Power for AI

How the AI Learns to Empower

Proof of Concept: The Gridworld Experiment

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates