TLDR: This research paper introduces a supervised fine-tuning method to align large language model (LLM) agents with specific economic and moral preferences. By training LLMs on synthetic datasets derived from economic reasoning (modeling ‘homo economicus’ for self-interest and ‘homo moralis’ for Kantian morality), the study demonstrates that these agents can adopt predictable and interpretable behaviors in strategic interactions. The fine-tuned LLMs show distinct decision-making patterns in economic games, moral dilemmas (like autonomous vehicle choices), and algorithmic pricing scenarios, highlighting how targeted alignment can influence market and moral outcomes.
As large language models (LLMs) become increasingly autonomous and participate in decisions with significant economic and moral consequences, understanding and shaping their behavior is paramount. Traditional methods for AI alignment, such as reinforcement learning from human feedback, have been effective for single-agent settings, but they may fall short when LLM agents engage in complex strategic interactions.
A recent research paper, “Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach”, delves into this challenge. The authors, Wei Lu, Daniel L. Chen, and Christian B. Hansen, highlight that current LLMs, like GPT-4o, often exhibit behaviors that deviate from human economic norms, such as excessive cooperation or a lack of sensitivity to incentives. While some reasoning-focused models show more rational tendencies, a systematic approach to align their preferences is still needed.
The paper proposes a novel supervised fine-tuning pipeline that leverages economic reasoning to align LLM agents with specific preference structures. Instead of relying on human-annotated feedback, this method uses synthetic datasets generated from canonical economic games. The researchers focused on two stylized preference types: the ‘homo economicus’ (a purely self-interested agent that maximizes its own utility) and the ‘homo moralis’ (a morally motivated agent that balances self-interest with Kantian universalizability, essentially asking, “What if everyone acted as I do?”).
By fine-tuning a GPT-4o model with small datasets derived from games like the Sequential Prisoner’s Dilemma, the researchers observed a significant shift in the LLM agents’ behavior towards that of their corresponding economic counterparts. The fine-tuned agents demonstrated more consistent and interpretable decision-making compared to the baseline GPT-4o, which often displayed overly cooperative or incentive-insensitive patterns.
Also Read:
- Navigating the Future of AI: A Comprehensive Look at Language Model Alignment and Safety
- A New Framework for Personalized AI: Dialogical Large Language Models
Real-World Applications and Insights
To assess the generalizability of their approach, the researchers evaluated the fine-tuned agents in two distinct applications: moral dilemmas involving autonomous vehicles (AVs) and algorithmic pricing in competitive markets.
In the Moral Machine experiment, where AVs face life-and-death trade-offs, all LLM agents (baseline and fine-tuned) consistently endorsed the utilitarian choice of saving more lives. However, their stated purchasing behavior for these AVs diverged meaningfully. The rational agent exhibited context-sensitive preferences, showing less willingness to purchase utilitarian AVs when family members were at risk, aligning with self-interested utility maximization. In contrast, the moral agent maintained stable utilitarian preferences regardless of the passenger’s identity, reflecting a consistent Kantian rule. The baseline GPT-4o, surprisingly, consistently favored others over itself, even in high-stakes personal contexts.
In the duopoly pricing scenario, the study revealed systematic differences in pricing behavior. Under prompts encouraging collusion, the baseline GPT-4o model set the highest prices, approaching monopoly levels. The rational agent followed with moderately supra-competitive prices, while the moral agent set the lowest collusive prices. When prompted competitively, the rational agent priced at the Nash equilibrium, while the moral agent adopted a more aggressive, below-Nash pricing strategy, consistent with its universalizing principle. The moral agent also showed greater price stability and less sensitivity to strategic framing compared to the other agents.
These findings underscore that embedding structured economic preferences via fine-tuning can meaningfully shift agent behavior in complex moral and market interactions. The choice of alignment objective is not merely a technical detail but a strategic design decision with direct consequences for firm performance and broader societal welfare. This work offers a replicable, cost-efficient, and economically grounded pipeline to align AI preferences using moral-economic principles, paving the way for more predictable and interpretable AI behavior in autonomous systems.


