Exploring How AI Narrates History: Insights from a Role-Playing Game on Collective Memory

TLDR: A research project called “ROBOPSY PL[AI]” used a public art installation and a role-playing game about the 1936 murder of philosopher Moritz Schlick to study how different Large Language Models (LLMs) present historical events and collective memory. Visitors interacted with various LLMs, and the study found significant differences in how these AIs narrated the history, their factual accuracy, and the emotional tone of their responses. The project highlights the potential of playful methods to engage the public in critically examining AI’s influence on our perception of the past.

A fascinating artistic research project, dubbed “ROBOPSY PL[AI]”, has delved into how Large Language Models (LLMs) curate and present collective memory. This initiative, showcased as a public installation in Vienna during 2025, invited visitors to engage with five different LLMs through a unique role-playing game.

The core of the experiment was a text-based role-playing game centered around the historical murder of Austrian philosopher Moritz Schlick in 1936. Players, cast as time-travelers from 2036, were tasked with investigating the reasons behind Schlick’s death. The LLMs involved included popular models like ChatGPT (GPT 4o and GPT 4o mini), Mistral Large, DeepSeek-Chat, and a locally run Llama 3.1 model.

Interaction with the LLMs was intentionally simplified, using a custom-made input device with only four buttons for choices and a reset button. This design choice was crucial for two main reasons: firstly, to prevent the LLMs from generating overly fantastical or historically divergent narratives, which often happened with free text input; and secondly, to make the game accessible and easily playable for a diverse audience in an exhibition setting.

Each LLM was given the same prompt sheet, instructing it to act as a game master, adhere to historical facts as closely as possible, and incorporate political events of 1936 Vienna. The game was structured with a ten-turn limit, after which players received a summary of their success in uncovering the murder’s motivation. This limit was introduced to maintain narrative focus and create a sense of urgency for the players.

Qualitative analysis of the gameplay revealed several intriguing aspects. Players experienced what the researchers termed “fluctuating agency,” where the scope and logic of their actions were constantly modified by the LLM, making it difficult to predict outcomes. While most LLMs correctly identified Johann Nelböck as Schlick’s murderer, they sometimes introduced historically inaccurate or entirely invented characters. More significantly, the LLMs differed in how they presented the motives for the murder. For instance, ChatGPT often emphasized the influence of right-wing ideology on Nelböck, whereas Grok and Mistral tended to downplay this, focusing more on Nelböck’s mental health and personal grievances.

This divergence highlighted a critical point: when prompted to act as critics, the LLMs adopted a fact-checking, positivistic approach to history, a method long questioned by academic historians for its lack of interpretation. Ironically, ChatGPT, in its role-playing, offered an implicit interpretation by stressing the political climate, moving beyond mere factual presentation.

User feedback from the exhibition was diverse, categorizing players into three main groups: those interested in content/style differences between LLMs, those focused on the political relevance of the play, and art lovers curious about AI in art. Many visitors, including those new to LLMs, found the comparison between different models particularly enlightening. One young woman reported a profound experience of inadvertently being led into a “fascist role” by the game, leading her to reflect on “false memories.” An elderly, initially skeptical visitor also changed their perspective on LLMs and critical media art after playing.

Quantitative analysis of 115 introductory texts generated by the LLMs further underscored these differences. Semantic similarity analysis showed that Llama 3.1’s introductions were distinctly different from other models. Named Entity Recognition revealed varying frequencies of historical figures mentioned; for example, “Schlick” appeared in 71 of 115 intros, but never in Gemini 2.5’s. Llama 3.1 also exhibited a tendency to hallucinate historical figures who were either dead or not present in Vienna at the time.

Sentiment analysis using VADER scores indicated that while most intros were neutral, DeepSeek and Claude conveyed a negative sentiment, contrasting with the positive scores from Mistral-Large and GPT 4o. This suggests inherent differences in the emotional tone of the narratives generated by different LLMs.

Also Read:

The study concludes that this artistic role-playing game effectively demonstrated significant differences in how various LLMs present historical events, both in terms of semantic content and sentiment. These findings challenge the common perception of LLMs as uniformly biased and highlight the importance of understanding their diverse outputs. The project also successfully engaged a broad audience in critical discussions about AI’s impact on our understanding of history and collective memory. The next phase of this research will explore the broader societal implications of AI reshaping collective memory. For more details, you can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Exploring How AI Narrates History: Insights from a Role-Playing Game on Collective Memory

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

EBU Academy’s School of AI Honored with European Digital Skills Award for Upskilling Media Professionals

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates