TLDR: FarsiMCQGen is a new framework for automatically generating high-quality multiple-choice questions (MCQs) in Persian. Developed by researchers at Amirkabir University of Technology, it combines advanced techniques like Transformers and knowledge graphs for question generation and sophisticated candidate generation, filtering, and ranking for creating plausible distractors. The framework also introduces a novel 10,289-question Persian MCQ dataset. Evaluations, both automated with LLMs and human-based, confirm the validity and effectiveness of the generated questions and choices, offering a significant resource for Persian language education and NLP research.
Creating effective multiple-choice questions (MCQs) is a cornerstone of educational assessment, offering an efficient way to gauge a learner’s understanding. However, this task becomes particularly challenging when dealing with low-resource languages like Persian, where specialized tools and datasets are scarce. Manual MCQ generation is also a time-consuming process that demands significant expertise.
Addressing this challenge, researchers Mohammad Heydari Rad, Rezvan Afari, and Saeedeh Momtazi from Amirkabir University of Technology have introduced FarsiMCQGen, an innovative framework designed to automatically generate high-quality Persian-language MCQs. This new approach aims to streamline the creation of educational content and support language learning in Persian.
The FarsiMCQGen framework operates through a sophisticated, multi-stage process that combines advanced natural language processing techniques with rule-based methods. It focuses on generating not just the questions, but also credible ‘distractors’—the incorrect answer choices that are designed to challenge test-takers effectively.
The system’s architecture is divided into two main components: question generation and wrong choice (distractor) generation.
Generating Questions
For question generation, FarsiMCQGen utilizes a fine-tuned mT5-based model. This model is trained on the PQuAD dataset, a large-scale Persian question-answering dataset derived from Wikipedia. By feeding the model an answer and its corresponding text, it learns to formulate contextually relevant questions.
Also Read:
- Guiding AI to Better Answers: Exemplar-Guided Planning for Knowledge Graph Question Answering
- JudgeSQL: Enhancing Text-to-SQL Accuracy Through Intelligent Selection
Crafting Distractors
The generation of wrong choices is a critical aspect of creating effective MCQs. FarsiMCQGen employs a three-step process for this:
1. Candidate Generation: This involves two methods. The first uses a ‘fill-mask’ technique with various Transformer-based language models (like ParsBERT and ALBERT-Persian). It takes a complete answer sentence, masks the correct answer, and then predicts plausible alternatives. The second method identifies words semantically similar to the correct answer using GloVe and Word2Vec embeddings, which are trained on a Persian Wikipedia corpus.
2. Filtering: To ensure quality and efficiency, unsuitable candidates are filtered out. This includes a Part-Of-Speech (POS) filter to ensure grammatical consistency, a Written Form filter to standardize numerical representations (e.g., converting ‘2’ to ‘two’ or vice-versa), and a Named Entity Recognition (NER) filter to match entity types (e.g., ensuring a person’s name distractor for a person’s name answer).
3. Ranking and Selection: The filtered candidates are then ranked using two similarity approaches. A Knowledge Graph Embedding Similarity leverages FarsWikiKG, a Persian knowledge graph, to assess relationships between entities. A BERT Similarity module calculates the semantic similarity between the correct answer and each distractor within the given context. The top three candidates, based on a combined score from these two methods, are selected as the final wrong choices.
The research also introduces a new Persian MCQ dataset comprising 10,289 questions, categorized by both type (e.g., ‘What’, ‘When’, ‘Where’) and content (e.g., History, Technology, Science, Politics). This dataset serves as a valuable resource for further research and development in Persian NLP.
To validate the quality of the generated questions and distractors, both automated and human evaluations were conducted. Several state-of-the-art large language models (LLMs) were tested, with models like Qwen2.5-14B-Instruct and Meta-Llama-3.1-8B-Instruct showing strong performance. Human evaluators assessed a sample of 200 questions, confirming that 97.5% of the questions and options were logically valid, and 94.5% of the wrong choices were effectively distractive.
This work marks a significant step forward in automatic MCQ generation for the Persian language, offering a robust framework and a high-quality dataset that can inspire future advancements in educational technology and language processing. For more details, you can refer to the full research paper: FARSIMCQGEN: A PERSIAN MULTIPLE-CHOICE QUESTION GENERATION FRAMEWORK.


