Building Reliable AI: Bridging Large Language Models and Expert Systems

TLDR: A research paper proposes a hybrid method to develop accurate and explainable expert systems by extracting knowledge from Large Language Models (LLMs) and encoding it into Prolog. This approach mitigates LLM hallucinations through human validation of the symbolic knowledge base, achieving over 99% factual accuracy and combining LLM recall with symbolic system precision for dependable AI applications.

In the evolving landscape of artificial intelligence, a fascinating convergence is taking place between traditional AI methods and the cutting-edge capabilities of Large Language Models (LLMs). A recent research paper, titled “GOFAI meets Generative AI: Development of Expert Systems by means of Large Language Models,” explores a novel approach to building expert systems that leverages the strengths of both worlds.

The Challenge with Large Language Models

Large Language Models have revolutionized how we interact with information, enabling systems to generate vast amounts of seemingly coherent text. They power everything from chatbots to content creation. However, these powerful models come with a significant drawback: the tendency to “hallucinate.” This means they can confidently produce incorrect, misleading, or unverifiable information. Such inaccuracies are particularly problematic in critical fields like medicine, law, or education, where reliable knowledge is paramount.

Hallucinations can stem from various issues, including outdated or biased training data, model architecture problems, or a focus on text fluency over factual accuracy. Detecting and mitigating these false responses is crucial for deploying LLMs in sensitive applications.

A Hybrid Solution: Combining LLMs with Expert Systems

The paper introduces a transparent and controlled method for developing expert systems using LLMs. The core idea is to limit the domain of knowledge and use a structured, prompt-based approach to extract information from LLMs. This extracted knowledge is then represented symbolically in Prolog, a logic programming language well-suited for expert systems.

Prolog has a long history in AI, known for its declarative nature and inference capabilities. It allows knowledge to be expressed as facts and rules, enabling the system to deduce new information. Famous applications include medical diagnosis systems like MYCIN and configuration tools like XCON.

How the System Works

The proposed system operates through a carefully designed pipeline. First, an LLM (such as Claude Sonnet 3.7 or GPT-4.1) is queried using specific prompts designed by human experts. These prompts guide the LLM to extract structured information about a defined concept, limiting the scope to a smaller, more manageable knowledge domain.

The LLM’s responses, typically in a structured format like JSON, are then translated into Prolog facts and relations. For instance, a concept might be represented as concept(plato), and a relationship as developed_by(theory_of_forms, plato). The system also preserves natural language explanations from the LLM as comments within the Prolog code, enhancing transparency.

A key advantage of this approach is that the symbolic representation in Prolog can be easily validated and corrected by human experts. This human oversight ensures the veracity and reliability of the knowledge base, addressing the hallucination problem inherent in raw LLM outputs. The system also supports the construction of multi-layered conceptual graphs, allowing for recursive reasoning and logical querying.

Benefits and Validation

This hybrid methodology offers several significant advantages:

Explainability: The rule-based nature of Prolog makes the system’s reasoning transparent and understandable.
Greater Volume of Information: LLMs can quickly process and extract large amounts of information, which is then refined.
Veracity and Reliability: Human experts can easily identify and correct errors in the Prolog knowledge base, ensuring high accuracy.

The researchers conducted both quantitative and qualitative experiments to validate their approach. Quantitatively, they found that the extracted knowledge achieved over 99% factual accuracy when compared against established sources, significantly exceeding an 80% benchmark. Qualitatively, the system demonstrated its ability to generate meaningful semantic expansions and coherent knowledge graphs, which can be visualized for easier interpretation.

The generated expert systems were also successfully executed and queried using a Prolog inference engine, confirming their practical feasibility. This demonstrates that the approach is robust and can be applied across various topics and LLM models.

Also Read:

Looking Ahead

By combining the vast recall capacity of LLMs with the precision and interpretability of symbolic systems, this research lays the foundation for more dependable AI applications, especially in sensitive domains where accuracy and transparency are critical. This hybrid solution offers a promising path forward for building expert systems that are both powerful and trustworthy. For more in-depth details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Building Reliable AI: Bridging Large Language Models and Expert Systems

The Challenge with Large Language Models

A Hybrid Solution: Combining LLMs with Expert Systems

How the System Works

Benefits and Validation

Looking Ahead

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates