TLDR: This paper introduces an ontology-grounded framework for automated skill decomposition using Large Language Models (LLMs). It proposes semantic and hierarchy-aware F1-scores to evaluate content accuracy and structural granularity. Comparing zero-shot and leakage-safe few-shot prompting on the ROME-ESCO-DecompSkill benchmark, the study finds that few-shot prompting consistently stabilizes phrasing and granularity, improving hierarchical alignment, particularly for medium-scale LLMs. The research highlights the value of symbolic ontologies as structural priors for guiding generative models towards appropriate skill granularity.
In today’s rapidly evolving world, understanding and categorizing skills accurately is crucial for everything from personalized learning to effective job matching. However, existing expert-created skill databases, like the European Commission’s ESCO ontology or the U.S. O*NET database, often struggle to keep pace with technological changes and can present skills at inconsistent levels of detail. This creates a “granularity gap,” where broad skills need to be broken down into finer, more actionable sub-skills for practical applications.
A recent research paper, titled “Automated Skill Decomposition Meets Expert Ontologies: Bridging the Granularity Gap with LLMs,” by LE Ngoc Luyen and Marie-Hélène ABEL, delves into how Large Language Models (LLMs) can address this challenge. The paper explores using LLMs to automatically decompose broad skills into more specific sub-skills, ensuring these outputs are verifiable and structurally sound against expert knowledge. You can read the full paper here.
A New Framework for Skill Decomposition
The researchers propose a rigorous, ontology-grounded evaluation framework. This framework standardizes the entire process, from how LLMs are prompted to generate sub-skills, to how these generated skills are normalized and aligned with existing ontology nodes. Instead of treating the expert skill ontology as a source of answers for the model, it’s used as a “gold-standard ruler” for evaluation, ensuring the LLM’s outputs are accurate and consistent with established knowledge.
To evaluate the quality of the decomposed skills, the paper introduces two innovative metrics:
- Semantic F1-score: This metric assesses the content accuracy of the generated sub-skills by using advanced embedding-based matching. Essentially, it checks how well the meaning of the generated sub-skill aligns with the meaning of the gold-standard sub-skill.
- Hierarchy-aware F1-score: This novel metric goes a step further by crediting structurally correct placements. It not only checks if the sub-skill is semantically correct but also if it fits into the right place within the skill hierarchy, addressing the crucial aspect of granularity.
Prompting Strategies and Performance
The study investigates two main strategies for prompting LLMs:
- Zero-shot prompting: In this approach, the LLM receives only the instruction to decompose a skill, without any prior examples. It relies solely on the model’s pre-trained knowledge. The research found that zero-shot prompting provides a strong baseline, showing that LLMs inherently possess useful decomposition capabilities. However, outputs can sometimes drift in depth or include overly broad items.
- Few-shot prompting: Here, the LLM is given a small, curated set of examples (exemplars) to guide its generation. These exemplars are carefully chosen to avoid “information leakage” – meaning they don’t directly reveal the answers for the target skill but rather steer the model’s style and specificity. Few-shot prompting consistently stabilized phrasing and granularity, leading to improved hierarchy-aware alignment, especially for medium-scale LLMs.
The experiments were conducted on a benchmark called ROME-ESCO-DecompSkill, a dataset curated from the ESCO and ROME ontologies. The findings suggest that while zero-shot methods are robust, few-shot prompting acts as a “structural prior,” helping LLMs produce more reliable and taxonomically coherent skill decompositions. For very large models, the choice of exemplars becomes critical, as poorly matched examples can sometimes limit the breadth of the generated skills.
Also Read:
- Multidimensional Feedback for Smarter Language Models
- Smart Hints: LLMs Accelerate Reinforcement Learning in Tricky Environments
Efficiency and Future Directions
The paper also includes a latency analysis, examining the time taken for different LLMs and prompting strategies to generate decompositions. It was observed that few-shot prompting isn’t always slower; in some cases, exemplar-guided prompts can lead to more concise and schema-compliant outputs, potentially reducing generation time. This highlights that efficiency is highly dependent on both the model and the specific prompt design.
In conclusion, this research provides a foundational framework for developing skill decomposition systems that are faithful to expert ontologies. It demonstrates the significant potential of LLMs in breaking down complex skills into actionable units, which can have profound implications for personalized learning, job matching, and workforce development. Future work will explore more advanced techniques like retrieval-augmented grounding and adaptive exemplar selection to further enhance these systems.


