CMOMgen: Automating Complex Ontology Alignment with Pattern-Guided AI

TLDR: CMOMgen is a novel, end-to-end strategy for Complex Multi-Ontology Matching (CMOM) that automatically generates complete and semantically sound mappings between concepts from multiple ontologies. Unlike previous methods, it places no restrictions on the number of target ontologies or entities. By combining Retrieval-Augmented Generation (RAG) for selecting relevant classes and examples, with In-Context Learning (ICL) using a language model, CMOMgen significantly outperforms baselines in biomedical tasks, achieving high F1-scores and demonstrating its ability to create accurate complex logical expressions, reducing the need for extensive manual expert effort.

Knowledge Graphs (KGs) are powerful tools for organizing and understanding data, making complex information more accessible to humans. A key component of these graphs is ontologies, which provide a structured way to define concepts and their relationships within a specific domain. However, real-world data often spans multiple domains, requiring the integration of several ontologies. This process, known as ontology matching, aims to find equivalences between concepts across different ontologies to create a unified semantic layer.

While simple ontology matching, which finds one-to-one equivalences, is well-established, it often falls short when dealing with ontologies that have fundamentally different perspectives or cover complementary aspects of a domain. This is where Complex Multi-Ontology Matching (CMOM) becomes crucial. CMOM goes beyond simple equivalences by aligning a single concept from one ontology to a composite logical expression made up of multiple concepts from one or more target ontologies. This allows for more nuanced and precise semantic connections, capturing intricate relationships that simple mappings cannot.

Historically, creating these complex mappings has been a labor-intensive task, primarily relying on domain experts. This manual effort is time-consuming and costly, highlighting a significant need for automated solutions. Existing automated approaches for CMOM have often been limited by specific scenarios, fixed patterns, or restrictions on the number of target ontologies involved.

Introducing CMOMgen: A New Approach to Complex Multi-Ontology Alignment

A new research paper introduces CMOMgen, the first end-to-end strategy designed to generate complete and semantically sound complex multi-ontology mappings. What makes CMOMgen stand out is its ability to handle any number of target ontologies and entities without imposing predefined patterns, offering a flexible and comprehensive solution to a long-standing challenge. You can read the full paper here: CMOMgen: Complex Multi-Ontology Alignment via Pattern-Guided In-Context Learning.

CMOMgen leverages a combination of advanced AI techniques: Retrieval-Augmented Generation (RAG) and In-Context Learning (ICL). RAG helps by intelligently selecting relevant classes (concepts) from target ontologies and filtering existing reference mappings to serve as examples. These examples then guide the In-Context Learning process, where a language model generates the final complex mappings in OWL (Web Ontology Language) format.

The process within CMOMgen involves several key steps. First, it pre-processes ontology vocabularies to maximize the available names and synonyms. Then, a crucial “classes selection” step identifies potential target classes using two complementary recursive strategies: one based on lexical similarity (finding non-overlapping target labels that cover the source label) and another based on language model embeddings (finding the most similar target combination to the source concept). These selected classes are then aggregated and filtered. Next, “pattern extraction” identifies general patterns from existing complex mappings that match the selected classes, which are then used as examples. Finally, “mapping composition” uses a language model, guided by the source entity, selected classes, and extracted examples, to construct the complete OWL expression.

Performance and Impact

CMOMgen was rigorously evaluated across three biomedical tasks involving phenotype ontologies like the Human Phenotype Ontology (HP), Mammalian Phenotype Ontology (MP), and Worm Phenotype Ontology (WBP). These domains are particularly relevant because complex mappings can significantly impact real-life solutions, such as diagnosing phenotypes.

The results were highly promising. CMOMgen consistently outperformed baseline methods, achieving a minimum F1-score of 63% and demonstrating a more than three-fold increase in performance over existing state-of-the-art methods in two out of three tasks. An important finding from ablation studies was that providing mapping examples to the language model was the most critical component for achieving improved results, highlighting the power of in-context learning guided by relevant examples.

Beyond reference-based evaluations, a manual assessment by an OWL expert on non-reference mappings (mappings generated for concepts not present in existing logical definitions) further substantiated CMOMgen’s capabilities. 46% of these mappings achieved the maximum score for fidelity, meaning they accurately captured the semantics of the mapped concept, with an additional 24% being reasonable approximations. This indicates CMOMgen’s potential to extrapolate and create new, semantically sound complex mappings for entities not yet covered by manual definitions.

Also Read:

Looking Ahead

While CMOMgen represents a significant leap forward, the authors acknowledge inherent limitations, such as the computational and runtime costs associated with language models, which could be a barrier for larger problems. Future work includes further testing parameters, exploring its application in other ontology matching paradigms (like complex pairwise alignment), and investigating its performance in different domains beyond phenotype ontologies.

Ultimately, CMOMgen offers a powerful automated solution that can drastically reduce the manual effort currently required to construct complex multi-ontology mappings. By enabling the automatic creation of complete and semantically sound alignments, it paves the way for richer, more integrated knowledge bases that can connect disjoint but related domains more effectively.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

CMOMgen: Automating Complex Ontology Alignment with Pattern-Guided AI

Introducing CMOMgen: A New Approach to Complex Multi-Ontology Alignment

Performance and Impact

Looking Ahead

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates