spot_img
HomeResearch & DevelopmentAutomating REST API Tests with Language Models and Test...

Automating REST API Tests with Language Models and Test Specifications

TLDR: A new research paper introduces RestTSLLM, an approach that combines Test Specification Language (TSL) with Large Language Models (LLMs) to automate the generation of integration tests for REST APIs. By using prompt engineering and an intermediate TSL step, LLMs are guided to create test scenarios from OpenAPI specifications and convert them into executable tests. An evaluation of various LLMs, including Claude 3.5 Sonnet, Deepseek R1, and Qwen 2.5 32b, demonstrated their effectiveness in generating high-quality tests with strong success rates, coverage, and mutation scores. Claude 3.5 Sonnet emerged as the top performer, highlighting the significant potential of LLMs in streamlining and enhancing REST API testing processes.

Testing plays a critical role in ensuring the quality and reliability of software systems. However, effectively testing REST APIs, which are widely used for communication between different services, presents significant challenges. The complexity of distributed systems, the vast number of possible scenarios, and limited time for test design often lead to incomplete testing, undetected failures, and high manual effort.

To address these persistent issues, researchers have introduced RestTSLLM, an innovative approach that combines Test Specification Language (TSL) with Large Language Models (LLMs) to automate the generation of test cases for REST APIs. This method specifically targets two core challenges: creating comprehensive test scenarios and defining appropriate input data.

The RestTSLLM approach integrates prompt engineering techniques with an automated pipeline to evaluate various LLMs. It works by first instructing the LLM to act as an experienced developer and tester, capable of understanding REST API specifications. Then, through a ‘few-shot’ and ‘decomposed prompting’ technique, the LLM is shown examples of how to convert an OpenAPI specification into structured test cases using TSL, and subsequently how to translate those TSL cases into executable integration tests, for instance, using .NET with xUnit.

The use of TSL as an intermediate step is crucial. It simplifies the problem for the LLM by allowing it to focus solely on understanding business rules and defining test scenarios in a human-readable, declarative format, without being burdened by code structure or syntax. Once these scenarios are clear in TSL, a second prompt guides the LLM to convert them into functional test code.

An extensive evaluation was conducted on eight prominent LLMs: Claude 3.5 Sonnet (Anthropic), Deepseek R1 (Deepseek), Qwen 2.5 32b (Alibaba), Sabiá 3 (Maritaca), LLaMA 3.2 90b (Meta), GPT 4o (OpenAI), Gemini 1.5 Pro (Google), and Mistral Large (Mistral). These models were tested against six open-source REST API projects. The evaluation focused on key metrics such as success rate, test coverage (specifically branch coverage), and mutation score, which assesses how well tests detect small changes in the system’s logic. A calculated score, using the TOPSIS technique, combined these metrics to determine overall performance.

The results were highly promising. All evaluated LLMs demonstrated effectiveness in generating integration tests that reflected the intended business logic and context, producing compilable code with high readability and adherence to test patterns. The average success rates across all models were above 95.5%, indicating that the generated tests were largely functional and stable.

Also Read:

Top Performing Models

Among the models, Claude 3.5 Sonnet emerged as the top performer, achieving the highest average calculated score and ranking first in all individual metrics. It was notably the only model that produced no failed tests during the evaluation. Deepseek R1, Qwen 2.5 32b, and Sabiá 3 also delivered strong results, closely following Claude 3.5 Sonnet in performance. Even models with lower average scores, such as Mistral Large, Gemini 1.5 Pro, GPT 4o, and LLaMA 3.2 90b, still showed solid performance, particularly in success rate and often in coverage or mutation score.

The study also highlighted the cost-effectiveness of using LLMs for test generation. The total cost of processing each project with any LLM remained very low, with several models delivering competitive results for less than $0.09 per execution, making this approach feasible even for budget-constrained environments.

While the approach showed significant potential, the researchers also identified areas for future improvement. These include addressing limitations related to LLM selection, the complexity of target projects, dependency on OpenAPI specifications, and the inherent subjectivity in qualitative analysis. Future work aims to expand the generalizability of the method to more complex architectures and different technologies, enhance automation with error correction techniques, and explore multilingual performance.

In conclusion, the RestTSLLM approach demonstrates that combining Test Specification Language with Large Language Models offers a viable and effective strategy for automating the generation of integration tests for REST APIs. This method not only streamlines the testing process but also enhances the quality and coverage of tests, marking a significant step forward in software testing automation. For more details, you can refer to the full research paper: Combining TSL and LLM to Automate REST API Testing: A Comparative Study.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -