Unlocking Better AI: The Power of Quantified Human Preferences

TLDR: Current AI language model alignment relies on simple “A is better than B” feedback, which is insufficient to capture the true importance of improvements. This paper introduces “cardinal feedback” using a “willingness-to-pay” approach to quantify how much better one AI response is over another. They prove that only cardinal feedback can systematically identify the best model and demonstrate empirically that models trained with this richer data (Cardinal Direct Preference Optimization – CDPO) significantly outperform those using traditional methods (DPO) on critical improvements and benchmarks like Arena-Hard.

In the rapidly evolving world of artificial intelligence, particularly with Large Language Models (LLMs), a critical challenge is ensuring these models align with human preferences and values. This process, known as alignment, often relies on human feedback. Traditionally, this feedback has been ‘ordinal’ – meaning humans simply choose which of two AI responses is better, like saying ‘Response A is better than Response B’. However, new research from Parker Whitfill and Stewy Slocum at MIT suggests this common approach has a fundamental flaw: it collects the wrong kind of data. Their paper, Beyond Ordinal Preferences: Why Alignment Needs Cardinal Human Feedback, argues for a shift towards ‘cardinal’ human feedback to truly optimize LLM performance.

The Challenge of LLM Alignment

Current methods for fine-tuning LLMs, such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), are designed to make models more helpful, harmless, and honest. Yet, studies have shown that these methods can sometimes lead to superficial improvements, like longer or more stylistically polished responses, without addressing deeper issues such as factual errors or safety concerns. The core issue, as identified by Whitfill and Slocum, is that binary ‘A is better than B’ choices don’t provide enough information to understand the *magnitude* of preference.

The Problem with Current Feedback Methods

Imagine an AI model that fixes a critical medical error in one response versus another that merely corrects a spelling mistake. Ordinal feedback would simply register both as a ‘win’ for the improved response, without distinguishing which improvement is more important. The researchers prove an ‘impossibility result’: no algorithm relying solely on these binary comparisons can consistently identify the most preferred model. This is because ordinal data lacks the necessary information to make informed trade-offs across different types of improvements or prompts. For instance, it can’t tell if fixing a major safety flaw on one prompt is more valuable than improving the writing style on another.

Introducing Cardinal Feedback: A New Approach

To overcome this limitation, the paper proposes collecting ‘cardinal’ feedback directly from humans. Cardinal feedback quantifies the *strength* of a preference. The researchers adopted a well-established tool from experimental economics: Willingness-to-Pay (WTP) elicitations. In this context, annotators are asked how much they would ‘pay’ (conceptually, or within a fixed budget) for a proposed improvement to an LLM’s response. Money serves as a universally understood and cardinally meaningful scale, allowing for consistent comparisons across different prompts and labelers. This approach allows the system to understand that avoiding a medical error is significantly more valuable than a minor stylistic correction.

The CARDINAL PREFS Dataset

To put their theory into practice, Whitfill and Slocum collected and publicly released a new dataset called CARDINAL PREFS. This dataset comprises over 25,000 human WTP judgments on LLM completions, sourced from conversations in ChatbotArena and Anthropic’s HHH dataset. Despite initial concerns about noise or calibration issues with cardinal data, their empirical analysis showed that the WTP scheme successfully elicited high-quality, meaningful cardinal data. They found that the cardinal data provided a significantly increased signal compared to traditional ordinal data, indicating its value in capturing true preference intensity.

Also Read:

Real-World Impact: CDPO Outperforms DPO

The researchers then integrated cardinal feedback into the fine-tuning process, introducing Cardinal Reinforcement Learning from Human Feedback (CRLHF) and Cardinal Direct Preference Optimization (CDPO). Their experiments demonstrated clear advantages:

In a simplified setting where model-level preferences could be directly measured, CDPO selected the optimal model significantly more often than DPO (90.27% vs. 83.29%).
Using simulated data, CDPO achieved 50% higher mean ground-truth reward compared to DPO, indicating that it produces more aligned models. Crucially, CDPO’s advantage grew with the strength of the preference, showing it successfully prioritizes high-impact improvements.
On real-world data, CDPO consistently outperformed DPO on ‘important’ cases. While both methods output preferred responses at similar rates overall, CDPO showed better performance when observations were weighted by WTP or importance (as determined by another AI model). This means CDPO applies more optimization pressure to critical issues, whereas DPO tends to waste effort on less important, stylistic improvements.
Perhaps most impressively, on Arena-Hard, a challenging benchmark measuring win-rates against GPT-4, CDPO won almost 55% more battles than DPO.

In conclusion, this research highlights a fundamental limitation of current LLM alignment techniques and offers a robust solution. By moving beyond simple binary choices to incorporate richer, cardinal human feedback, AI models can be trained to prioritize truly important improvements, leading to more aligned, reliable, and ultimately, more valuable AI systems.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unlocking Better AI: The Power of Quantified Human Preferences

The Challenge of LLM Alignment

The Problem with Current Feedback Methods

Introducing Cardinal Feedback: A New Approach

The CARDINAL PREFS Dataset

Real-World Impact: CDPO Outperforms DPO

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

A New Way to Disentangle Data for Scientific Exploration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates