TLDR: This paper introduces a novel auditing framework and efficient algorithms based on “justified representation” from social choice theory to measure how well selected questions represent participants’ interests in online deliberative processes. It evaluates human-selected, extractive (algorithmically chosen from participant questions), and abstractive (LLM-generated summary questions) slates, finding that algorithmic approaches often yield more representative questions than human moderators, highlighting the potential of LLMs while emphasizing the need for robust auditing. The framework has been integrated into a widely used online deliberation platform.
In today’s increasingly digital world, online deliberative processes are becoming a popular way for communities and organizations to gather informed public opinion on important policy issues. These processes, like citizens’ assemblies or deliberative polls, often involve participants engaging directly with experts. A crucial step in these deliberations is selecting a small number of questions from a much larger pool proposed by participants to be posed to expert panels. This selection process is vital because the chosen questions must accurately represent the diverse interests and concerns of all participants.
A new research paper, titled “Question the Questions: Auditing Representation in Online Deliberative Processes,” by Soham De, Lodewijk Gelauff, Ashish Goel, Smitha Milli, Ariel Procaccia, and Alice Siu, tackles this challenge head-on. The authors introduce an innovative auditing framework designed to measure how well a chosen set of questions truly represents the interests of all participants. This framework is grounded in a concept from social choice theory known as “justified representation” (JR).
Understanding Justified Representation
At its core, justified representation (JR) is about fairness and proportionality. Imagine you have a large group of participants, and only a limited number of questions can be asked. JR informally states that if ‘k’ questions are selected, then any group of participants large enough to ‘deserve’ one question by proportional allocation (e.g., if there are ‘n’ participants and ‘k’ questions, any group of ‘n/k’ participants) who share similar preferences should have at least one question among the selected ‘k’ that represents them. The paper uses a quantitative variant of JR, allowing for a measurable degree of representativeness rather than just a binary ‘yes’ or ‘no’ answer.
The Role of AI and New Algorithms
The researchers explore how large language models (LLMs) could assist in this question selection process. LLMs have shown promise in summarizing public opinions and generating consensus statements. However, given the sensitive nature of deliberative processes, it’s essential to rigorously audit whether LLM-generated summary questions genuinely represent all participants.
The paper presents the first algorithms for auditing JR in a general utility setting, moving beyond simpler approval voting scenarios. Their most efficient algorithm can audit representation with a runtime of O(mn log n), where ‘m’ is the number of proposed questions and ‘n’ is the number of participants. This computational efficiency is crucial for real-world application in large-scale online deliberations.
To infer a participant’s utility (or preference) for a given question, the framework uses a straightforward and transparent approach: it measures the cosine similarity between LLM-generated embeddings of the participant’s own proposed question and any other potential question. This method ensures that the utility estimates are interpretable and avoids the ‘black-box’ nature of more complex predictive models.
Evaluating Question Selection Methods
The authors applied their auditing methods to historical deliberation data from the Stanford Deliberative Democracy Lab, covering topics from democratic reform to AI agents. They compared the representativeness of:
- Actual questions posed to expert panels (chosen by a human moderator).
- Participant questions chosen via an integer linear programming (IP) approach, which aims to optimize representation.
- Summary questions generated by large language models (LLMs).
The findings are insightful. Both extractive (selecting from existing participant questions) and abstractive (generating new summary questions using LLMs) algorithmic methods generally produced question slates with greater representativeness than those selected by human moderators. While LLM-generated summaries sometimes outperformed extractive methods, and vice-versa, the overall trend suggests a significant potential for algorithmic improvement in deliberative processes. This variability underscores the importance of a robust auditing mechanism to determine when LLM-based methods truly add value.
Also Read:
- Navigating AI Risks: Understanding Diverse Perspectives with LLMs
- A Hybrid AI Approach for More Reliable and Interpretable Fact-Checking
Integration into a Live Platform
A key contribution of this work is the integration of these auditing algorithms and question selection/generation methods into an online deliberation platform. This platform has been used for hundreds of deliberations in over 50 countries, making it easy for practitioners to audit and improve representation in future discussions. The platform can compute the JR value for any generated slate and display a heatmap illustrating the similarity between selected questions and participant contributions, enhancing transparency and trust.
This research highlights both the promise and current limitations of LLMs in supporting democratic processes. By providing tools to audit and enhance question representation, the paper offers a pathway to more inclusive and effective online deliberations. You can read the full paper for more details at arXiv:2511.04588.


