TLDR: A new paper reveals that the Plackett-Luce model, foundational to AI alignment methods like Direct Preference Optimization, implicitly assumes the “proportional hazards” condition from the Cox model. This connection implies that current preference models may misestimate human preferences when underlying utilities violate this assumption, particularly with polarizing concepts, suggesting avenues for more robust AI alignment.
A recent research paper by Chirag Nagpal from Meta Superintelligence Labs (MSL) uncovers a significant, yet previously underappreciated, connection between two fundamental statistical models: the Plackett-Luce model and the Cox Proportional Hazards model. This insight has profound implications for how we build and refine AI systems, particularly in the realm of AI alignment.
The Plackett-Luce model is a cornerstone in estimating preferences from human-annotated data. It’s widely used in modern AI alignment techniques like Reward Modelling and Direct Preference Optimization (DPO). Essentially, it helps AI understand human preferences by modeling ranked choices, such as “A is preferred over B.”
On the other hand, the Cox Proportional Hazards model is a well-established tool in fields like biostatistics and reliability engineering. It’s typically used to analyze “time-to-event” data, like patient survival times or the lifespan of a machine part. The core assumption of this model is “proportional hazards,” meaning the ratio of hazard rates between two different groups remains constant over time.
Nagpal’s paper reveals that the mathematical structure of the Plackett-Luce model is remarkably similar to the partial likelihood function used in the Cox Proportional Hazards model. This means that when we use Plackett-Luce to model preferences, we are implicitly assuming that the underlying human utility functions (the absolute value or quality of a choice) adhere to the “proportional hazards” assumption. You can read the full paper for more details here: Preference Models assume Proportional Hazards of Utilities.
Also Read:
- Bridging the Divide: A New Framework for Aligning AI and Human Evaluations
- AI Models and Uncertainty: How Reinforcement Learning Can Lead to Overconfident Predictions
Implications for AI Alignment
This connection is more than just a theoretical curiosity; it has practical consequences for AI alignment. If the underlying human preferences violate the proportional hazards assumption—for instance, when dealing with highly polarizing concepts where different groups might have vastly different utility scales—then models based on Plackett-Luce (like DPO) might misestimate human preferences. This could lead to AI systems that are not truly aligned with diverse human values.
The paper suggests that understanding this link can help the AI research community leverage existing knowledge from semi-parametric statistics to develop more robust models of human preference. For example, the Cox model offers ways to estimate the “baseline hazard rate,” which in the context of preferences, could correspond to the absolute utility of an item. This could allow for better estimation of both relative and absolute utilities, especially when absolute feedback (like a 5-point rating scale) is available alongside relative rankings.
In conclusion, this research highlights a crucial statistical underpinning of current AI alignment methods. By recognizing that preference models implicitly assume proportional hazards, researchers can identify potential limitations and explore new avenues for building more accurate and reliable AI systems that truly understand and reflect human preferences, even in complex and nuanced scenarios.


