spot_img
HomeResearch & DevelopmentAI-Driven Question Generation: A New Era for Text-Based Evaluation

AI-Driven Question Generation: A New Era for Text-Based Evaluation

TLDR: This research introduces an Automatic Question Answer Generation (AQAG) system using a fine-tuned Meta-Llama 2-7B Large Language Model. The system aims to simplify the creation of diverse text-based assessments for educators and provide self-evaluation tools for students. By leveraging prompt engineering and training on the RACE dataset, the model generates various question types (MCQ, conceptual, factual) and their answers, demonstrating its potential to save time and resources in educational and professional evaluation processes, despite facing hardware limitations and some biases in question generation.

In today’s fast-paced educational landscape, the process of creating effective and fair assessments for students is more crucial than ever. However, educators often face significant challenges in manually developing diverse sets of questions from extensive lecture materials. This time-consuming task can limit the variety of questions and delay valuable feedback for students. Recognizing this critical need, a recent research paper explores an innovative solution: Automatic Question Answer Generation (AQAG) powered by advanced Artificial Intelligence.

Authored by A.S.M Mehedi Hasan, Md. Alvee Ehsan, Kefaya Benta Shahnoor, and Syeda Sumaiya Tasneem from Brac University, this study introduces a system designed to streamline the assessment process. Their work, detailed in the paper titled “Automatic Question & Answer Generation Using Generative Large Language Model (LLM)”, aims to free up valuable time and resources for educators and individuals involved in text-based evaluations. You can read the full research paper here.

The Challenge of Manual Assessment

The traditional method of creating questions, whether multiple-choice, conceptual, or factual, requires instructors to sift through numerous lecture materials. This manual effort often leads to a lack of diversity in questions and can be a significant burden. From a student’s perspective, timely feedback is essential for self-evaluation and identifying areas for improvement before formal assessments. Beyond education, the corporate world also grapples with the challenge of designing unbiased and effective assessments for hiring, where prioritizing skills over CV information is key.

AI’s Role in Transforming Q&A Generation

The researchers propose leveraging the power of Large Language Models (LLMs) to automate this process. Their system utilizes a fine-tuned generative LLM, specifically the Meta-Llama 2-7B model, which has been adapted to understand and generate human-like text. The core idea is to train this model on a vast dataset of reading comprehensions and questions, enabling it to create new, contextually relevant questions and their corresponding answers.

A key technique employed in this research is Prompt Engineering. This involves carefully crafting instructions and examples for the AI model to guide it towards generating questions in a preferred style, such as multiple-choice questions (MCQs), or questions that test conceptual or factual understanding. This ensures that the generated questions align with the instructor’s specific requirements.

Behind the Scenes: How the Model Works

The system’s development involved several crucial steps. First, the Meta-Llama 2-7B model, a powerful AI with 7 billion parameters, was chosen as the foundation. This model was then fine-tuned using the RACE dataset, which comprises 10,000 reading comprehensions and 40,000 questions derived from English tests given to middle and high school students in China. This extensive dataset helped the model learn the nuances of question-answer relationships.

To make the model more efficient and capable of running on systems with limited resources, a technique called 4-bit quantization was applied. This process simplifies the model’s internal calculations, significantly reducing its memory footprint without a major loss in performance. Additionally, text was broken down into smaller units called “tokens” through tokenization, making it easier for the AI to process and understand language.

Evaluating the System’s Performance

The fine-tuned model’s performance was evaluated using various metrics. One important measure was “perplexity,” which assesses how well a language model predicts a text sample. A lower perplexity score indicates better prediction confidence. The custom fine-tuned Llama-2 model achieved a perplexity score of 6.43 on a standard test set, showing its capability in understanding and generating text.

The researchers also measured the relevance of the generated questions to their source articles using a “cosine similarity” score, which quantifies how semantically similar two pieces of text are. Questions consistently showed good relevance scores, indicating they were pertinent to the context. Furthermore, the quality of multiple-choice options was assessed by measuring their similarity to the correct answer, ensuring that all options were plausible and well-categorized.

Also Read:

Impact and Future Directions

This AQAG system holds immense potential for various sectors. In education, it can empower faculties to quickly generate diverse assessments and provide students with tools for self-practice and evaluation. In the corporate world, it could assist in creating unbiased skill-based assessments for job applicants. The research also contributes to a deeper understanding of the capabilities of newer LLMs like Llama-2, which was open-sourced recently.

While promising, the project encountered challenges, including hardware limitations that necessitated the use of a 4-bit quantized model, potentially affecting peak performance. The study also noted some limitations, such as a bias towards generating conceptual questions due to the training data and the current inability to process analytical questions or direct PDF documents. Future work aims to address these by exploring newer LLMs, investigating biases using Explainable AI (XAI) techniques, and expanding the model’s capabilities to include multilingual support and analytical question generation.

Ultimately, this research paves the way for a more dynamic and efficient approach to text-based evaluation, promising significant benefits across educational and professional domains.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -