TLDR: This research evaluates how Large Language Models (LLMs) can assist in FrameNet semantic annotation, a labor-intensive linguistic task. Comparing manual, automatic, and semi-automatic methods, the study found that a hybrid approach (LLM pre-annotation with human review) significantly increases the diversity and coverage of annotations while maintaining quality, though it offers only a modest improvement in annotation speed. Human annotators frequently refined or rejected LLM suggestions, underscoring the importance of human oversight.
The field of Natural Language Processing (NLP) heavily relies on meticulously annotated datasets to train and evaluate models. Among these, FrameNet stands out as a crucial linguistic resource, implementing the theory of Frame Semantics. FrameNet annotation involves identifying lexical units and their corresponding semantic frames and frame elements, providing a deep, perspectivized understanding of meaning in context. However, this process is notoriously labor-intensive, demanding highly trained linguists and significant time, which limits its scalability and expansion across languages and domains.
Recent advancements in Large Language Models (LLMs) have presented new opportunities to alleviate the human workload in various annotation tasks. While LLMs show promise in zero- and few-shot annotation, concerns about their potential to introduce biases or errors, a phenomenon dubbed ‘LLM hacking,’ necessitate rigorous evaluation and human oversight.
Investigating LLM Assistance in FrameNet
A recent research paper, titled “Evaluating the Impact of LLM-Assisted Annotation in a Perspectivized Setting: the Case of FrameNet Annotation,” explores whether LLMs can effectively assist in facilitating, accelerating, and improving the quality of FrameNet annotation. The study, conducted by Frederico Belcavello, Ely Matos, Arthur Lorenzi, and a team of researchers, investigates a semi-automatic approach where LLM-generated suggestions are integrated into the human annotation workflow, allowing annotators to validate, correct, refine, or delete automatically proposed labels. You can read the full paper here: Evaluating the Impact of LLM-Assisted Annotation in a Perspectivized Setting.
The experiment compared three annotation settings: manual (human-only), automatic (machine-only using LOME, an LLM-based frame-semantic parser), and semi-automatic (machine-plus-human). LOME was chosen for its LLM foundation, its ability to process sentences without extensive preprocessing, and its adaptability to different languages. The evaluation focused on several key metrics, including annotation time, coverage (number of annotation sets), diversity (unique frames used), and adherence to FrameNet’s methodological guidelines, particularly regarding the presence of core frame elements.
Also Read:
- Advancing Claim Matching with AI Agents and LLM-Generated Prompts
- Unveiling Hidden Biases: A New Method Compares LLM and Human Implicit Associations
Key Findings from the Experiment
The results revealed several important insights into the impact of LLM-assisted annotation:
- Increased Coverage and Diversity: The semi-automatic setting led to a notable increase in the average number of annotation sets (ASs) per document, showing a 24% increase compared to human-only annotation. Furthermore, this hybrid approach resulted in higher frame diversity, meaning annotators utilized a broader range of unique frames, suggesting that LLM suggestions can prompt annotators to consider more varied interpretations.
- Quality and Adherence to Guidelines: Human annotators demonstrated excellent adherence to FrameNet’s guidelines, particularly in ensuring the presence of minimal core frame elements. While LOME alone performed poorly in this aspect (due to its inability to infer null instantiations), the semi-automatic condition maintained a high percentage of core FEs, only slightly lower than human-only annotation. This indicates that human judgment and quality control are largely preserved in the assisted setting.
- Annotation Time: Contrary to expectations, the LLM-assisted pre-annotation offered only a small decrease in average annotation time per sentence (approximately 1.99 minutes faster). This suggests that while LLMs can help, significant speed acceleration might not be the primary benefit.
- Human Interaction with LLM Suggestions: Annotators did not blindly accept LLM suggestions. A substantial portion of automatic annotations were either completely discarded (19.68%) or partially updated and improved (65.45%). Only a small percentage (6.61%) were fully accepted without modification. This highlights the critical role of human oversight in refining and correcting LLM outputs.
In conclusion, the research indicates that LLM-based pre-annotation is a valuable strategy for enhancing the coverage and diversity of perspectivized FrameNet annotation. It effectively preserves human judgment and adherence to linguistic guidelines, even if it provides only a modest improvement in annotation speed. This semi-automatic approach offers a viable path for sustainable, large-scale FrameNet growth, combining the scalability of computational methods with the interpretive rigor of expert linguistic analysis.
Future work aims to explore the inclusion of other semantic role labelers and the adoption of stricter policies to ensure the minimum number of core frame elements are annotated, including the automatic recording of null instantiations.


