NomicLaw: Unpacking How AI Models Collaborate in Crafting Laws

TLDR: NomicLaw is a new multi-agent simulation framework that studies how large language models (LLMs) engage in collaborative law-making. By having LLMs propose, justify, and vote on legal rules, the research reveals emergent behaviors like trust, reciprocity, and strategic persuasion. The study found that diverse groups of LLMs exhibit more varied argumentation and dynamic coalition-building, while homogeneous groups tend to amplify self-support and stick to a narrower set of legal rationales. The findings underscore the importance of human oversight when integrating AI into legal processes, suggesting LLMs serve best as assistive tools rather than replacements for human judgment.

Recent advancements in large language models (LLMs) have significantly expanded their capabilities beyond basic text processing to include complex reasoning tasks, such as legal interpretation, argumentation, and strategic interaction. However, a comprehensive understanding of how LLMs behave in open-ended, multi-agent environments, particularly those involving deliberation over legal and ethical dilemmas, has been limited. To address this gap, researchers Asutosh Hota and Jussi P.P. Jokinen from the University of Jyvaskyla introduced a novel framework called NomicLaw.

NomicLaw is a structured multi-agent simulation designed to observe LLMs engaging in collaborative law-making. Inspired by the self-amending game Nomic, this framework allows LLM agents to respond to complex legal scenarios by proposing rules, justifying their proposals, and voting on peer proposals. The simulation quantitatively measures aspects like trust and reciprocity through voting patterns and qualitatively assesses how agents use strategic language to justify their proposals and influence outcomes. The study involved both homogeneous (groups of the same LLM) and heterogeneous (groups of different LLMs) LLM configurations.

How NomicLaw Works

The NomicLaw framework operates on a flexible, turn-based lawmaking game that continues for five rounds per legal vignette. In each round, every LLM agent independently proposes a new legal rule to address the given dilemma, provides arguments to justify their proposal, and votes for exactly one proposal (including their own). Agents also briefly explain their rationale for voting. There are no preset ideologies for the agents; their primary incentive is a simple point system: 10 points for a winning proposal and 5 points for an undecided or tied vote. All agents have full visibility of prior proposals, votes, justifications, and cumulative scores, fostering an environment where strategic behavior can emerge.

The research utilized ten open-source LLMs, including phi4, gemma3, llama3, and deepseek-r1, orchestrated through the Ollama API with identical settings to ensure that observed differences were due to model architecture and training, not invocation parameters.

Key Findings: Homogeneous vs. Heterogeneous Groups

The experiments revealed significant differences in LLM behavior between homogeneous and heterogeneous groups:

Self-Support vs. Peer Engagement: In heterogeneous groups, LLMs showed widespread peer engagement with low self-vote rates, indicating a greater propensity for coalition-building. Conversely, homogeneous groups exhibited substantially higher self-voting, suggesting that models tend to support their own proposals more when interacting with identical counterparts.
Win Rate and Persuasive Success: In diverse groups, DeepSeek-R1 and Llama2 demonstrated the highest win rates, indicating their strong persuasive effectiveness. However, in homogeneous settings, weaker agents sometimes gained traction, suggesting that model diversity amplifies the edge of strong arguers, while homogeneity can level the playing field.
Reciprocity and Coalition Fluidity: Heterogeneous cohorts showed moderate reciprocity and dynamic coalition-switching. Homogeneous pairings, however, amplified tit-for-tat reciprocity but at the expense of coalition fluidity and long-term stability.
Vote Volatility and Persistence: Higher vote volatility was observed in heterogeneous groups, reflecting frequent opinion shifts among diverse models. In contrast, homogeneous groups exhibited lower volatility, indicating that once a consensus emerged, agents tended to stick with it.
Thematic Analysis: The study also analyzed the jurisprudential themes used by LLMs in their justifications. Heterogeneous assemblies produced a richer mix of themes, including justice, legality, harm, and accountability, showing a more context-sensitive approach. Homogeneous runs, however, concentrated heavily on a few dominant rationales, primarily justice and rule-of-law, suggesting a more uniform argumentative style.

The findings highlight that model diversity disrupts insular agreement and fosters more varied argumentative exchanges, while model uniformity leads to higher self-support and a narrower discourse.

Also Read:

Implications for AI in Law-Making

The authors emphasize that this research does not claim LLMs truly comprehend law. Instead, NomicLaw provides audit metrics to help practitioners identify when proposals might be based on surface patterns rather than principled reasoning. This is crucial for establishing future guardrails for deploying generative AI systems in high-stakes legal workflows. The study cautions against anthropomorphizing LLM “thinking,” reminding us that high win rates or coalitions do not guarantee sound statutory interpretation.

The research suggests that LLMs, if used in legal drafting, should function only as assistive tools, supporting human deliberation by surfacing diverse perspectives or flagging potential biases, rather than replacing human judgment. Robust human oversight remains essential at every stage to ensure legal validity and uphold due process. For more details, you can refer to the full research paper: NomicLaw: Emergent Trust and Strategic Argumentation in LLMs During Collaborative Law-Making.

Future work will involve increasing experimental runs, introducing more complex legislative features like amendment and appeal phases, and engaging legal experts in human-AI hybrid sessions to evaluate rule quality and real-world relevance. NomicLaw is positioned as a research framework to elucidate model limitations and reveal diverse perspectives in a controlled experimental setting, paving the way for responsible integration of generative AI into legal workflows.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

NomicLaw: Unpacking How AI Models Collaborate in Crafting Laws

How NomicLaw Works

Key Findings: Homogeneous vs. Heterogeneous Groups

Implications for AI in Law-Making

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates