spot_img
HomeResearch & DevelopmentSmall Language Models Show Promise in Formal Logic Reasoning...

Small Language Models Show Promise in Formal Logic Reasoning for Ontology Engineering

TLDR: This research investigates the ability of Small Language Models (SLMs) to process and represent formal knowledge for logical reasoning, aiming to assist ontology engineering. The study compares natural language with various compact formal languages like CLIF, TFL+, and MINIFOL across different SLMs and training methods (SFT, Zero-Shot, Few-Shot). Findings indicate that compact formal representations, particularly CLIF, can achieve competitive performance with natural language in first-order logic reasoning tasks, especially with Supervised Fine-Tuning. While techniques like tokenizer re-training show promise for smaller models, the overall results suggest that formal languages can be a viable alternative to natural language for enhancing SLM reasoning capabilities in knowledge representation.

Language models (LMs) have made incredible strides in various natural language processing tasks, from generating text to answering complex questions. However, a persistent challenge for these models lies in their reasoning capabilities, particularly in fields like ontology engineering—the process of creating structured knowledge representations. This limitation is especially noticeable in tasks requiring explicit or implicit logical thinking.

A recent preliminary study by Hanna Abi Akl and her supervisors at Université Côte d’Azur delves into this very issue, focusing on Small Language Models (SLMs). The research aims to understand how incorporating formal methods can improve SLMs’ performance on reasoning tasks, with a long-term goal of using these models to kickstart the construction of ontologies. The core question guiding their work is: Is there a better formal representation for logical data than natural language?

Exploring Formal Languages for Logical Reasoning

To investigate this, the researchers developed a methodology called the Syllogistic Evaluation Framework (SEF) combined with the Common Logic Grammar Construction (CLGC) pipeline. The SEF helps classify different types of logical reasoning problems, such as disjunctive, hypothetical, categorical, and complex syllogisms, using the FOLIO dataset. The CLGC pipeline is crucial for transforming logical data from its original First-Order Logic (FOL) form into various alternative formal languages. These included Common Logic Interchange Format (CLIF), Conceptual Graph Interchange Format (CGIF), Tensor Function Logic (TFL), Tensor Function Logic Plus (TFL+), and a custom language called Miniature First-Order Logic (MINIFOL).

The study experimented with a range of SLMs, including Flan-T5-small, Flan-T5-base, Flan-T5-large, GPT-2, Phi-3.5-mini-instruct, and Gemma-2-2b-it. They tested these models across different learning methods: Supervised Fine-Tuning (SFT), Zero-Shot (ZS) Prompting, and Few-Shot (FS) Prompting. The objective was to determine the truth value of a logical conclusion (True, False, or Uncertain) based on a given set of premises, using the various formal language representations as input.

Key Findings: The Power of Compact Formalisms

The results revealed several interesting insights. In the Supervised Fine-Tuning (SFT) setting, formalizing premises and conclusions in CLIF proved highly effective, often tying or even outperforming Natural Language (NL) in accuracy. This suggests that SLMs can reason well with more compact and structured formalisms than verbose natural language. Languages with more complex syntaxes, like CGIF, generally showed weaker performance, indicating that simpler formal structures might be easier for SLMs to process.

Interestingly, the smallest model tested, Flan-T5-small, sometimes achieved the best performance, even surpassing larger, fine-tuned models. This hints that increased architectural complexity might not always be beneficial for this specific type of reasoning task. In Zero-Shot (ZS) prompting, compact languages like CLIF and TFL+ also demonstrated competitive results, and augmenting prompts with BNF grammar rules generally improved SLM performance.

The researchers also explored advanced techniques like In-Context Grammar Passing (ICGP) and Tokenizer Re-Training. ICGP, where the BNF grammar was provided as additional context during SFT, surprisingly hindered learning and degraded model performance. Tokenizer re-training, which adapts the model’s vocabulary to the specific grammar, showed promise for smaller models and very compact data representations (like Flan-T5-small with TFL+). However, this method did not scale well to larger models, potentially leading to overfitting and reduced generalization.

An analysis using the Syllogistic Evaluation Framework (SEF) showed that models performed well on Disjunctive, Hypothetical, and Complex syllogisms. TFL+ even showed a slight edge over NL and CLIF for Complex syllogisms, possibly because its compact nature handles the ambiguity of multi-premise problems better. Categorical syllogisms, being under-represented in the dataset, yielded less conclusive results.

Also Read:

Implications and Future Directions

The study concludes that while natural language remains a strong baseline, compact and formal representations like CLIF can effectively challenge it for first-order reasoning tasks in SLMs. This is a significant finding, especially since these results were achieved with small, frugal language models (under 3 billion parameters), making them accessible for practical applications. The research confirms that Supervised Fine-Tuning is currently the most stable and effective training method for these tasks, and that controlled formal languages generally scale well with models.

Looking ahead, the PhD research will explore combining different input representations (e.g., NL + CLIF) to see if a blend of expressiveness and formal structure can further enhance reasoning. Another exciting direction involves injecting knowledge from high-level ontologies into SLMs using formal languages to facilitate ontology extension. For more details, you can read the full paper here.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -