TLDR: CMOMgen is a novel, end-to-end strategy for Complex Multi-Ontology Matching (CMOM) that automatically generates complete and semantically sound mappings between concepts from multiple ontologies. Unlike previous methods, it places no restrictions on the number of target ontologies or entities. By combining Retrieval-Augmented Generation (RAG) for selecting relevant classes and examples, with In-Context Learning (ICL) using a language model, CMOMgen significantly outperforms baselines in biomedical tasks, achieving high F1-scores and demonstrating its ability to create accurate complex logical expressions, reducing the need for extensive manual expert effort.
Knowledge Graphs (KGs) are powerful tools for organizing and understanding data, making complex information more accessible to humans. A key component of these graphs is ontologies, which provide a structured way to define concepts and their relationships within a specific domain. However, real-world data often spans multiple domains, requiring the integration of several ontologies. This process, known as ontology matching, aims to find equivalences between concepts across different ontologies to create a unified semantic layer.
While simple ontology matching, which finds one-to-one equivalences, is well-established, it often falls short when dealing with ontologies that have fundamentally different perspectives or cover complementary aspects of a domain. This is where Complex Multi-Ontology Matching (CMOM) becomes crucial. CMOM goes beyond simple equivalences by aligning a single concept from one ontology to a composite logical expression made up of multiple concepts from one or more target ontologies. This allows for more nuanced and precise semantic connections, capturing intricate relationships that simple mappings cannot.
Historically, creating these complex mappings has been a labor-intensive task, primarily relying on domain experts. This manual effort is time-consuming and costly, highlighting a significant need for automated solutions. Existing automated approaches for CMOM have often been limited by specific scenarios, fixed patterns, or restrictions on the number of target ontologies involved.
Introducing CMOMgen: A New Approach to Complex Multi-Ontology Alignment
A new research paper introduces CMOMgen, the first end-to-end strategy designed to generate complete and semantically sound complex multi-ontology mappings. What makes CMOMgen stand out is its ability to handle any number of target ontologies and entities without imposing predefined patterns, offering a flexible and comprehensive solution to a long-standing challenge. You can read the full paper here: CMOMgen: Complex Multi-Ontology Alignment via Pattern-Guided In-Context Learning.
CMOMgen leverages a combination of advanced AI techniques: Retrieval-Augmented Generation (RAG) and In-Context Learning (ICL). RAG helps by intelligently selecting relevant classes (concepts) from target ontologies and filtering existing reference mappings to serve as examples. These examples then guide the In-Context Learning process, where a language model generates the final complex mappings in OWL (Web Ontology Language) format.
The process within CMOMgen involves several key steps. First, it pre-processes ontology vocabularies to maximize the available names and synonyms. Then, a crucial “classes selection” step identifies potential target classes using two complementary recursive strategies: one based on lexical similarity (finding non-overlapping target labels that cover the source label) and another based on language model embeddings (finding the most similar target combination to the source concept). These selected classes are then aggregated and filtered. Next, “pattern extraction” identifies general patterns from existing complex mappings that match the selected classes, which are then used as examples. Finally, “mapping composition” uses a language model, guided by the source entity, selected classes, and extracted examples, to construct the complete OWL expression.
Performance and Impact
CMOMgen was rigorously evaluated across three biomedical tasks involving phenotype ontologies like the Human Phenotype Ontology (HP), Mammalian Phenotype Ontology (MP), and Worm Phenotype Ontology (WBP). These domains are particularly relevant because complex mappings can significantly impact real-life solutions, such as diagnosing phenotypes.
The results were highly promising. CMOMgen consistently outperformed baseline methods, achieving a minimum F1-score of 63% and demonstrating a more than three-fold increase in performance over existing state-of-the-art methods in two out of three tasks. An important finding from ablation studies was that providing mapping examples to the language model was the most critical component for achieving improved results, highlighting the power of in-context learning guided by relevant examples.
Beyond reference-based evaluations, a manual assessment by an OWL expert on non-reference mappings (mappings generated for concepts not present in existing logical definitions) further substantiated CMOMgen’s capabilities. 46% of these mappings achieved the maximum score for fidelity, meaning they accurately captured the semantics of the mapped concept, with an additional 24% being reasonable approximations. This indicates CMOMgen’s potential to extrapolate and create new, semantically sound complex mappings for entities not yet covered by manual definitions.
Also Read:
- Knowledge Graphs Enhance Multi-Agent Path Planning in Dynamic Environments
- The Evolving Landscape of Data Labeling: Powering Advanced AI Systems in 2025
Looking Ahead
While CMOMgen represents a significant leap forward, the authors acknowledge inherent limitations, such as the computational and runtime costs associated with language models, which could be a barrier for larger problems. Future work includes further testing parameters, exploring its application in other ontology matching paradigms (like complex pairwise alignment), and investigating its performance in different domains beyond phenotype ontologies.
Ultimately, CMOMgen offers a powerful automated solution that can drastically reduce the manual effort currently required to construct complex multi-ontology mappings. By enabling the automatic creation of complete and semantically sound alignments, it paves the way for richer, more integrated knowledge bases that can connect disjoint but related domains more effectively.


