TLDR: TextMine is an AI-powered system that uses Large Language Models and a specialized ontology to extract structured knowledge from unstructured humanitarian mine action reports. It improves extraction accuracy, reduces hallucinations, and creates a valuable knowledge base for demining operations, validated on Cambodian reports and adaptable globally.
Humanitarian Mine Action (HMA) faces a significant challenge: a vast amount of valuable best-practice knowledge is trapped in unstructured reports. This makes it difficult to share, access, and learn from crucial demining experiences, ultimately hindering decision-making and operational efficiency. In 2022 alone, landmines caused 4,710 casualties globally, with 85% being civilians, underscoring the urgent need for improved mine action strategies.
To address this, researchers have introduced TextMine, an innovative, ontology-guided pipeline that leverages Large Language Models (LLMs) to extract structured knowledge from HMA texts. TextMine aims to transform these unstructured reports into actionable insights, providing a foundation for a comprehensive demining knowledge base.
How TextMine Works
TextMine operates through a sophisticated pipeline that integrates several key components. First, it employs layout-aware document chunking to break down PDF reports into semantically coherent paragraph-level segments. This ensures that the LLMs receive manageable and context-rich inputs. Next, in the Ontology-Guided Knowledge Extraction phase, TextMine uses a newly constructed HMA ontology to guide the LLMs in extracting knowledge triples (subject-relation-object). This ontology, developed in collaboration with domain experts, systematically categorizes operational entities and relationships relevant to humanitarian demining.
A crucial aspect of TextMine is its use of domain-aware prompting. The research demonstrates that prompts enriched with ontology-aligned examples significantly boost extraction accuracy by up to 44.2%, reduce hallucinations (fabricated information) by 22.5%, and improve format conformance by 20.9% compared to baseline prompts. This highlights the power of providing LLMs with contextually relevant guidance.
Unique Contributions and Evaluation
TextMine stands out from previous approaches by reasoning over entire paragraphs, which enables better coreference resolution and multi-step inference. This is a significant advancement over prior sentence-level methods that often struggle with complex, domain-specific documents. Furthermore, TextMine utilizes a practical operational HMA ontology that is substantially larger than those used in existing benchmarks, making it more applicable to real-world scenarios.
The project also introduces the first dedicated HMA ontology and a curated dataset of real-world demining reports, filling a critical resource gap in the domain. For evaluation, TextMine employs a multi-perspective approach, combining reference-based metrics (comparing extracted triples against a human-annotated dataset) with a novel reference-free LLM-as-a-Judge framework. This LLM-as-a-Judge method helps assess the quality of extracted triples even when ground-truth data is scarce, and experiments show that a “Randomized Fair Judge Prompt” with GPT-4o significantly enhances ranking consistency.
Also Read:
- Bridging the Information Gap: How AI Can Enhance Mental Health Support
- Orchestrating AI and Human Expertise for Smarter Data Annotation
Impact and Adaptability
Validated on Cambodian reports in collaboration with the Cambodian Mine Action Centre (CMAC), TextMine has the potential to convert their technical reports into a structured knowledge base. This framework serves as a proof of concept for LLM-driven demining knowledge extraction, transforming unstructured reports into structured insights that can directly inform and optimize future clearance planning. While initially focused on Cambodia, TextMine is designed to be adaptable to global demining efforts and even other domains facing similar challenges with unstructured data.
The research paper, titled “TextMine: LLM-Powered Knowledge Extraction for Humanitarian Mine Action,” was authored by Chenyue Zhou, Gürkan Solmaz, Flavio Cirillo, Kiril Gashteovski, and Jonathan Fürst. You can read the full paper for more technical details and experimental results here: TextMine Research Paper.


