TLDR: A research paper details the experience of using Generative AI (Gen AI) tools across the entire software development lifecycle for a web-based healthcare system. While Gen AI significantly boosted productivity in tasks like requirements generation, design, and coding, the study concludes that full automation is not yet possible. Human oversight, meticulous prompt engineering, and manual validation were crucial for ensuring software quality, correcting AI-generated errors, and addressing complex logic and security issues. The paper emphasizes Gen AI’s role as a complementary tool, highlighting the need for evolving development processes and integrating ethical considerations.
A recent research paper explores the practical application of Generative Artificial Intelligence (AI) in the engineering and quality assurance of a web-based healthcare system. Conducted by researchers from the Universidade Federal do Rio de Janeiro (UFRJ), the study provides valuable insights into the benefits and challenges of integrating Gen AI tools throughout the software development lifecycle.
The team embarked on developing a web system for a thoracic diseases research group, aiming to replace manual clinical record-keeping for patients undergoing smoking cessation treatment. This project served as a real-world testbed to observe how emerging Gen AI technologies could enhance productivity and quality in software development processes. The central question guiding their work was whether a software system could be built entirely using Gen AI tools across all development phases.
Integrating Gen AI Across Development Phases
The researchers systematically incorporated various Gen AI tools into different stages of the project:
- Requirements Elicitation and Scenarios: Tools like MS CoPilot were used for refining text in vision documents, while Grok helped generate initial use case scenarios.
- Requirements Specification: ChatGPT-4o assisted in defining, classifying, and prioritizing functional and non-functional requirements.
- Software Design: Claude-3.5 was employed to support the definition of the system’s architecture, components, and data flows.
- Project Management: ChatGPT-o4-mini was used to create a project plan, though this phase highlighted significant limitations.
- Coding: The Lovable platform was chosen for generating both frontend and backend code, utilizing technologies like React, Tailwind CSS, and Supabase.
- Verification, Validation, and Testing (VV&T): Gemini 2.5 Flash was used for generating test plans and associating test cases with requirements. Other models like ChatGPT, DeepSeek, and Claude were explored for automated test generation, with Claude showing promising results.
Key Findings and Challenges
While Gen AI tools offered clear advantages in accelerating initial artifact creation and automating repetitive tasks, the study concluded that complete automation of software development is not yet feasible. Human intervention proved indispensable at every stage.
- Prompt Engineering is Paramount: The quality of outputs from Gen AI tools was directly proportional to the clarity, specificity, and detail of the prompts provided. Generic or vague prompts often led to incomplete or inconsistent results.
- Human Oversight is Critical: Developers had to perform extensive manual reviews, inspections, and validations of all generated artifacts. This was crucial for correcting omissions, duplications, ambiguities, and “hallucinations” (incorrect or unfounded responses) from the AI.
- Context Management: Maintaining context across multiple interactions with Gen AI models was a recurring challenge, often requiring developers to “remind” the model of previous information.
- Limitations in Complex Logic: Tools struggled with interpreting complex business rules, maintaining state between components, and ensuring consistency over time, particularly in coding and testing phases.
- Security Concerns: In the coding phase, the Lovable tool initially exposed API keys directly in the source code, highlighting a critical security vulnerability that required manual correction.
- Project Management Disconnect: Gen AI tools tended to generate traditional project plans that did not adapt well to the dynamic and iterative nature of Gen AI-assisted development, necessitating significant manual adjustments.
- New Artifacts and Skills: The use of Gen AI introduced new essential artifacts like structured prompts and Markdown documents, requiring developers to adapt their technical writing and definition skills.
Ethical Considerations
The research also touched upon ethical aspects, emphasizing that Gen AI outputs can reflect biases from training data, raise questions about intellectual authorship, and influence decisions without adequate explainability. The authors stressed the importance of integrating ethical principles like responsibility, transparency, and fairness from the outset of the development process.
Also Read:
- Navigating the AI Frontier: A Vision for Generative AI in Software Engineering
- Human-AI Collaboration Emerges as Key in Requirements Engineering, Study Finds
Conclusion: A Complementary Role for Gen AI
The study unequivocally states that Generative AI should be viewed as a complementary tool, not a replacement for human expertise. While it can significantly boost productivity and accelerate initial development cycles, the ultimate quality, compliance, and trustworthiness of the software still heavily rely on continuous human involvement. This includes skilled prompt engineering, thorough technical inspection, and diligent validation of all AI-generated artifacts. The findings suggest that current software development process models need to evolve to effectively integrate these new technologies and the associated new roles and artifacts. For more details, you can read the full paper here.


