TLDR: A new research paper introduces FRAME, a modular pipeline for meeting summarization that extracts and organizes salient facts to create more accurate and coherent summaries, significantly reducing hallucinations and omissions. It also presents SCOPE, a personalization protocol that guides LLMs through a ‘reason-out-loud’ process to tailor summaries to individual reader needs. To evaluate personalization, the paper proposes P-MESA, a reference-free metric that strongly aligns with human judgment. The work advocates for rethinking summarization to improve control, faithfulness, and personalization.
Meetings are a cornerstone of modern work, but summarizing them effectively remains a significant challenge. Traditional methods, especially those relying on large language models (LLMs), often fall short, producing summaries that can hallucinate information, omit crucial details, or be irrelevant to a specific reader. A new research paper introduces an innovative approach to tackle these issues, proposing a modular pipeline called FRAME and a personalization protocol named SCOPE.
The core problem, as identified by the researchers, is that current LLM-based systems treat conversations like linear text, compressing information without truly reconstructing its underlying meaning. Meetings present unique challenges: important content is scattered across different speakers, utterances depend on long-range context, and the relevance of information can vary greatly depending on who is reading the summary.
FRAME: A Fact-Based Approach to Meeting Summarization
To address these challenges, the researchers developed FRAME, which stands for Fact-based Reconstruction and Abstractive Meeting Summarization. This framework redefines summarization as a semantic enrichment task, mimicking how humans summarize. It operates in four distinct stages:
- Fact Identification: The system first extracts self-contained, verifiable facts from the meeting transcript, filtering out filler content and ambiguous statements. These facts are represented as ‘statement-context tuples,’ ensuring that each claim is paired with the minimal global context needed for its interpretation.
- Note-Taking: Next, these facts are scored for relevance and grouped to eliminate redundancy. Facts are categorized (e.g., Decision, Action Item, Insight, Context) and assigned a relevance score, allowing the system to prioritize key information.
- Organization: The retained facts are then used to create a structured outline that reflects the conversation’s logic. High-relevance facts form major outline points, while mid-relevance facts provide background.
- Summarization: Finally, an LLM enriches this outline into an abstractive summary, strictly using the extracted facts. A quality assurance step reviews the draft for outline adherence, factual accuracy, information coverage, and formatting, initiating revisions if necessary.
Evaluations on datasets like QMSum and FAME show that FRAME significantly reduces issues like hallucination and omission. For instance, hallucination scores dropped by 2 to 3 points on a 5-point scale, and irrelevance and omission scores also saw substantial improvements. This indicates that by focusing on verifiable facts and a structured pipeline, FRAME produces more faithful and coherent summaries.
SCOPE: Personalization Through Reason-Out-Loud
Beyond general summarization, the paper also introduces SCOPE, a protocol designed for personalizing summaries. Recognizing that different readers have different needs and goals, SCOPE guides an LLM through an explicit ‘reason-out-loud’ approach before selecting content. Inspired by cognitive science, SCOPE has the model answer a nine-question questionnaire about the reader’s goals, expertise, and understanding. This process creates a detailed reasoning trace that grounds content selection, leading to summaries that are highly tailored to the individual reader.
SCOPE integrates into FRAME’s Note-Taking stage, acting as a filter for fact selection. It has been shown to improve ‘knowledge fit’ and ‘goal alignment’ compared to prompt-only baselines, reducing oversimplification and ‘reader hallucination’—where the summary misinterprets the reader’s perspective.
Also Read:
- Beyond Single Scores: How DeCE Evaluates LLMs for Accuracy and Completeness
- When LLMs Invent: The Challenge of Knowledge Overriding Evidence in Process Modeling
P-MESA: A New Metric for Personalized Summaries
To properly evaluate personalized summaries, the researchers also developed P-MESA (Personalized-MEeting Summary Assessor). This is a multi-dimensional, reference-free evaluation framework that assesses how well a summary fits a target reader across seven dimensions: factuality, completeness, relevance, goal alignment, priority structuring, knowledge-level fit, and contextual framing. P-MESA aligns strongly with human judgment, achieving over 89% balanced accuracy against human annotations and correlating well with human severity ratings.
The findings advocate for a fundamental shift in how we approach summarization, emphasizing control, faithfulness, and personalization. The FRAME framework, SCOPE protocol, and P-MESA metric are all available as open-source resources, providing a powerful toolkit for future research and development in this field. You can learn more about this research by reading the full paper here.


