TLDR: RA–FSM is a novel, GPT-based research assistant designed to overcome common issues like hallucinations and mis-citations in large language models (LLMs) for scientific literature. It uses a finite-state control loop (Relevance → Confidence → Knowledge) grounded in vector retrieval and a deterministic citation pipeline. The system filters irrelevant queries, assesses confidence, decomposes complex questions, and retrieves information only when needed, providing well-cited answers with confidence labels. It builds a dual-store knowledge base (vector and relational) and was preferred by domain experts over other LLM baselines due to its improved factual alignment, reduced contradictions, and reliable evidence use, making it a practical tool for high-stakes technical work.
Large language models (LLMs) have significantly sped up how we synthesize information from vast amounts of literature. However, a major hurdle in their adoption for specialized fields, especially in expert workflows, has been their tendency to ‘hallucinate’—making up facts—and mis-cite sources. This unreliability can be a critical flaw when accuracy is paramount.
Addressing these challenges, researchers have introduced a novel system called RA–FSM (Research Assistant – Finite State Machine). This modular, GPT-based research assistant is designed to be highly resistant to hallucinations and to provide accurate, domain-specific information, particularly in technical fields like photonics. The core innovation lies in wrapping the generation process within a structured, finite-state control loop: Relevance → Confidence → Knowledge. This loop is firmly grounded in vector retrieval and a precise, deterministic citation system.
How RA–FSM Works: A Structured Approach
The RA–FSM operates like a well-organized agent, guiding the LLM through a series of defined steps. It starts by filtering out queries that are outside its scope. Then, it assesses how confident it is in answering a question. If confidence is low, it breaks down complex questions into smaller, manageable parts and only retrieves external information when absolutely necessary. The system then provides answers with clear confidence labels and ensures all references are verified and come directly from its knowledge base.
A key aspect of RA–FSM is its sophisticated knowledge ingestion process. It builds a comprehensive domain knowledge base by gathering information from various sources, including scientific journals, conference proceedings, preprints, and patents. This information is stored in two ways: a dense vector index for semantic understanding and a relational database for normalized numerical data. This dual-store approach allows for both prose grounding and quantitative checks, ensuring a robust and accurate knowledge foundation.
Key Contributions of the Design
The researchers highlight several significant contributions:
- Finite-State Control: The FSM (Relevance → Confidence → Knowledge) includes termination bounds and retry budgets. This structured approach prevents the model from over-thinking, aligning computational cost with the actual need for information.
- Deterministic Citations: The system enforces a ‘closed-world’ citation policy. This means answers can only reference evidence that has been explicitly retrieved and verified during the session, significantly reducing the risk of fabricated citations.
- Dual-Store Ingestion: By using both a vector index for semantic passages and a relational table for numerical and specification fields, the system can handle both qualitative and quantitative information with high accuracy.
- Comprehensive Evaluation: The system was rigorously evaluated across six task categories, including analytical reasoning, numerical analysis, methodological critique, comparative synthesis, factual extraction, and application design.
Also Read:
- Enhancing Access to UK Clinical Guidelines with AI: A RAG System for Healthcare
- Evaluating RAG Systems with Knowledge Graphs: A New Framework for Deeper Semantic Analysis
Performance and Expert Preference
In blind reviews, domain experts consistently preferred RA–FSM over other strong baselines, such as a Notebook LM and a standard (vanilla) GPT API call. Experts praised RA–FSM for its superior handling of boundary conditions and its more defensible use of evidence. Analysis also showed that RA–FSM explores a broader range of information than the Notebook LM, albeit with a tunable increase in latency and cost.
The system demonstrates significant improvements in factual alignment and response reliability. It achieves a high ‘YES’ rate (85% overall, 88.4% on high-confidence items) in capturing the meaning of gold responses, compared to 76–79% for ungated vanilla GPT. Crucially, contradiction rates dropped sharply from 20% in vanilla GPT to just 3% with RA–FSM. While confidence calibration remains a challenge, the system’s self-assessment capabilities were improved through techniques like isotonic regression.
Despite the added complexity, RA–FSM remains practical for interactive research assistance, with an average latency of 54.6 seconds and a mean cost of $0.017 per query. These overheads are primarily due to the question decomposition and online search states, which are only triggered when the system’s confidence is low, making them tunable based on budget and task requirements.
This research presents RA–FSM as a practical blueprint for building auditable, budget-aware retrieval-augmented generation (RAG) systems that can be generalized to various scientific domains. For more details, you can read the full paper here.


