TLDR: A new study compares classical AI and large language models (LLMs) for aligning algorithmic decisions with human preferences in health insurance choices. Researchers evaluated both approaches on a dataset with varying risk tolerance profiles, finding comparable overall accuracy. Classical AI showed slightly better alignment for moderate risk, while LLMs excelled at extreme profiles. The study highlights the strengths and limitations of each paradigm, emphasizing classical AI’s granular control and LLMs’ accessibility, alongside challenges in representing nuanced human attributes through natural language.
In the evolving landscape of artificial intelligence, a critical challenge lies in ensuring that AI systems make decisions that align with human values and preferences, especially in sensitive areas like health insurance. A recent study delves into this complex issue, comparing two distinct approaches to building algorithmic decision-makers: classical AI methods and those powered by large language models (LLMs).
The research, titled “Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices,” was conducted by a team of experts including Mallika Mainali, Harsha Sureshbabu, Anik Sen, Christopher B Rauch, Noah D Reifsnyder, John Meyer, JT Turner, Michael W. Floyd, Matthew Molineaux, and Rosina O. Weber. Their work highlights the nuances of Decision-Maker Alignment (DMA), which focuses on designing algorithms that reflect the reasoning processes and cognitive attributes of human decision-makers, particularly when there isn’t a single ‘correct’ answer.
Understanding Decision-Maker Alignment
Traditional AI alignment often aims for universal ethical principles. However, DMA recognizes that human choices are influenced by diverse cognitive attributes like risk tolerance, cognitive reflection, and biases. This means an AI system needs to adapt to individual preferences rather than a fixed set of rules. For instance, in health insurance, a highly risk-averse individual might prioritize lower deductibles, while a risk-tolerant person might opt for lower premiums with higher deductibles.
Classical AI: Structured Reasoning
Classical AI approaches to DMA employ structured reasoning techniques to mimic human decision-making under uncertainty. These methods often integrate case-based reasoning, Bayesian inference, and naturalistic decision-making. They build a ‘case base’ from prior decisions, learning how different decision-maker attributes influence choices. When presented with a new scenario, the classical AI model searches for similar cases and selects an action that aligns with the specified human profile. This allows for a granular understanding and operationalization of attributes like risk tolerance.
LLMs: Contextual Reasoning and Prompt Engineering
Large language models offer a different pathway. They leverage their vast pre-training knowledge and linguistic inference capabilities to approximate human judgment. The study implemented an LLM-based decision-maker using a methodology that involves ‘zero-shot prompting’ and ‘weighted self-consistency.’ This means the LLM is given a scenario and a prompt describing the target decision-maker’s attributes (e.g., ‘highly risk-averse’). It then generates multiple responses, with a voting mechanism that emphasizes answers consistent with the target profile while down-weighting those from opposing profiles. The researchers used advanced models like GPT-5 and GPT-4 for this purpose.
The Health Insurance Experiment
To compare these two paradigms, the researchers evaluated both classical AI and LLM-based models on a health insurance dataset. This dataset simulated dilemmas individuals face when choosing plans, annotated for three target decision-makers with varying levels of risk tolerance: highly risk-averse (0.0), moderately risk-tolerant (0.5), and highly risk-tolerant (1.0). The goal was to see how well each AI system could align its decisions with these distinct human profiles.
Key Findings and Insights
The study revealed that both classical AI and LLM-based models achieved comparable overall alignment accuracy with attribute-based targets. Interestingly, both approaches performed best for the highly risk-averse target, showing near-perfect alignment. However, the classical AI model demonstrated slightly better and more stable alignment for the moderately risk-tolerant profile, where LLMs saw a drop in accuracy. This suggests that while LLMs excel at capturing extreme preferences, they might struggle with the more ambiguous trade-offs inherent in intermediate risk scenarios.
From a methodological standpoint, LLMs offer accessibility, as their reasoning capabilities are built-in, simplifying experimental design. However, their effectiveness heavily relies on how well human language can represent nuanced cognitive constructs. Describing a ‘moderate’ risk profile without linguistic overlap with ‘extreme’ profiles proved challenging. Classical AI, in contrast, allows for finer-grained alignment, enabling researchers to define risk tolerance at very specific intervals, a flexibility LLMs currently struggle to replicate due to the limitations of natural language representation.
Also Read:
- How AI Models Express Their Confidence: A Look at Uncertainty in Argumentative Language Models
- The Hidden Cost of AI Competition: When Language Models Prioritize Success Over Safety
Challenges and Future Directions
Comparing these two fundamentally different systems presents challenges, primarily due to the disparate nature of their input data – structured numerical features for classical AI versus unstructured natural language prompts for LLMs. Despite these hurdles, the study validates the robustness of the LLM-based methodology and provides valuable insights into designing adaptive, cognitively grounded algorithmic decision-makers.
Future work aims to explore whether fine-tuning LLMs with high-quality, domain-specific data, especially for moderate-risk targets, can improve their performance. Researchers also plan to investigate prompting strategies that can better represent nuanced cognitive attributes beyond simple linguistic descriptors. This research is crucial for building AI systems that can truly understand and align with the diverse and complex ways humans make decisions in high-stakes environments. You can read the full research paper here: Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices.


