spot_img
HomeResearch & DevelopmentAI-Generated Data Enhances Accuracy in Health Fact-Checking

AI-Generated Data Enhances Accuracy in Health Fact-Checking

TLDR: A new research paper introduces an LLM-driven pipeline to generate synthetic training data for health fact-checking. This method summarizes documents, extracts atomic facts, builds sentence-fact tables, and creates synthetic text-claim pairs. When combined with original data, it significantly boosts the F1 scores of BERT-based fact-checking models on PubHealth and SciFact datasets, addressing the challenge of limited annotated data in the health domain and even showing potential for detecting LLM hallucinations.

In the critical field of health information, ensuring the accuracy of claims is paramount to public well-being. However, developing reliable fact-checking systems faces a significant hurdle: a scarcity of high-quality, labeled training data. Traditional annotation processes for health-related content demand specialized medical expertise, making them costly and time-consuming. This often leads to models struggling to generalize effectively to medical claims, as existing general-purpose datasets lack the necessary domain-specific knowledge.

A recent research paper, titled “Enhancing Health Fact-Checking with LLM-Generated Synthetic Data,” proposes an innovative solution to this data limitation. Authored by Jingze Zhang, Jiahe Qian, Yiliang Zhou, and Yifan Peng, the study introduces a novel pipeline that leverages the power of large language models (LLMs) to create synthetic training data, thereby augmenting existing datasets and significantly improving the performance of health fact-checkers. You can read the full research paper here.

How the Synthetic Data Pipeline Works

The core of this research lies in its four-step synthetic data generation pipeline, designed to create a richer training set for fact-checking models:

1. Document Decomposition: The process begins by taking original source documents and generating concise summaries. Simultaneously, these summaries are broken down into ‘atomic facts’ – the most basic, indivisible pieces of information. This step ensures that facts are isolated and clearly defined.

2. Sentence-Fact Table Construction: An LLM is then employed to build a structured table. In this table, each sentence from the original document is mapped against each extracted atomic fact. The LLM determines and marks whether a given sentence supports a particular fact, establishing entailment relations.

3. Synthetic Data Generation: Using the meticulously constructed sentence-fact table, synthetic text-claim pairs are generated. A subset of sentences is randomly selected from the document and combined to form new text. An atomic fact is chosen as a synthetic claim, and its veracity (true or false) is automatically assigned by checking if any of the selected sentences support that fact in the table.

4. FACT CHECKER Development: Finally, these newly generated synthetic examples are merged with any original, manually annotated data. This augmented dataset is then used to fine-tune a BERT-based fact-checking model, referred to as FACT CHECKER. The model learns to classify whether a claim is supported or unsupported by a given document.

Impressive Performance Improvements

The effectiveness of this LLM-driven approach was rigorously evaluated on two public benchmark datasets: PubHealth and SciFact. The results demonstrated significant improvements in fact-checking performance. On the PubHealth dataset, the pipeline led to an F1 score improvement of up to 0.019. Even more notably, on the SciFact dataset, the F1 score saw an impressive increase of up to 0.049 compared to models trained solely on the original data.

The study also explored the impact of varying the ‘synthetic proportion’ – the percentage of sentences selected from original documents to construct synthetic data. While the optimal proportion varied across different subsets of data, the consistent finding was that incorporating synthetic data, especially at well-tuned proportions, consistently outperformed baselines. For instance, on a 1,500-instance subset of PubHealth, selecting just 10% of sentences yielded the highest F1 score of 0.831, surpassing the baseline of 0.812.

Detecting AI Hallucinations

Beyond enhancing fact-checking, the FACT CHECKER model also showed promise in a pilot study focused on detecting hallucinations in LLM-generated text summaries. By constructing sentence-fact tables for LLM-generated summaries and comparing them against original documents, the system could identify instances where facts in the summary were not supported by any sentence in the original document, indicating potential hallucinations or inferences not directly present in the source material.

Also Read:

Looking Ahead

This research underscores the immense potential of LLM-driven synthetic data augmentation in addressing the critical data scarcity issue in health fact-checking. By providing a scalable and efficient method to generate high-quality training examples, this pipeline offers a feasible solution for developing more robust and accurate fact-checking systems, ultimately contributing to a more informed public health landscape.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -