Integrating Generative AI in Healthcare Software Development: Lessons from a Real-World Project

TLDR: A research paper details the experience of using Generative AI (Gen AI) tools across the entire software development lifecycle for a web-based healthcare system. While Gen AI significantly boosted productivity in tasks like requirements generation, design, and coding, the study concludes that full automation is not yet possible. Human oversight, meticulous prompt engineering, and manual validation were crucial for ensuring software quality, correcting AI-generated errors, and addressing complex logic and security issues. The paper emphasizes Gen AI’s role as a complementary tool, highlighting the need for evolving development processes and integrating ethical considerations.

A recent research paper explores the practical application of Generative Artificial Intelligence (AI) in the engineering and quality assurance of a web-based healthcare system. Conducted by researchers from the Universidade Federal do Rio de Janeiro (UFRJ), the study provides valuable insights into the benefits and challenges of integrating Gen AI tools throughout the software development lifecycle.

The team embarked on developing a web system for a thoracic diseases research group, aiming to replace manual clinical record-keeping for patients undergoing smoking cessation treatment. This project served as a real-world testbed to observe how emerging Gen AI technologies could enhance productivity and quality in software development processes. The central question guiding their work was whether a software system could be built entirely using Gen AI tools across all development phases.

Integrating Gen AI Across Development Phases

The researchers systematically incorporated various Gen AI tools into different stages of the project:

Requirements Elicitation and Scenarios: Tools like MS CoPilot were used for refining text in vision documents, while Grok helped generate initial use case scenarios.
Requirements Specification: ChatGPT-4o assisted in defining, classifying, and prioritizing functional and non-functional requirements.
Software Design: Claude-3.5 was employed to support the definition of the system’s architecture, components, and data flows.
Project Management: ChatGPT-o4-mini was used to create a project plan, though this phase highlighted significant limitations.
Coding: The Lovable platform was chosen for generating both frontend and backend code, utilizing technologies like React, Tailwind CSS, and Supabase.
Verification, Validation, and Testing (VV&T): Gemini 2.5 Flash was used for generating test plans and associating test cases with requirements. Other models like ChatGPT, DeepSeek, and Claude were explored for automated test generation, with Claude showing promising results.

Key Findings and Challenges

While Gen AI tools offered clear advantages in accelerating initial artifact creation and automating repetitive tasks, the study concluded that complete automation of software development is not yet feasible. Human intervention proved indispensable at every stage.

Prompt Engineering is Paramount: The quality of outputs from Gen AI tools was directly proportional to the clarity, specificity, and detail of the prompts provided. Generic or vague prompts often led to incomplete or inconsistent results.
Human Oversight is Critical: Developers had to perform extensive manual reviews, inspections, and validations of all generated artifacts. This was crucial for correcting omissions, duplications, ambiguities, and “hallucinations” (incorrect or unfounded responses) from the AI.
Context Management: Maintaining context across multiple interactions with Gen AI models was a recurring challenge, often requiring developers to “remind” the model of previous information.
Limitations in Complex Logic: Tools struggled with interpreting complex business rules, maintaining state between components, and ensuring consistency over time, particularly in coding and testing phases.
Security Concerns: In the coding phase, the Lovable tool initially exposed API keys directly in the source code, highlighting a critical security vulnerability that required manual correction.
Project Management Disconnect: Gen AI tools tended to generate traditional project plans that did not adapt well to the dynamic and iterative nature of Gen AI-assisted development, necessitating significant manual adjustments.
New Artifacts and Skills: The use of Gen AI introduced new essential artifacts like structured prompts and Markdown documents, requiring developers to adapt their technical writing and definition skills.

Ethical Considerations

The research also touched upon ethical aspects, emphasizing that Gen AI outputs can reflect biases from training data, raise questions about intellectual authorship, and influence decisions without adequate explainability. The authors stressed the importance of integrating ethical principles like responsibility, transparency, and fairness from the outset of the development process.

Also Read:

Conclusion: A Complementary Role for Gen AI

The study unequivocally states that Generative AI should be viewed as a complementary tool, not a replacement for human expertise. While it can significantly boost productivity and accelerate initial development cycles, the ultimate quality, compliance, and trustworthiness of the software still heavily rely on continuous human involvement. This includes skilled prompt engineering, thorough technical inspection, and diligent validation of all AI-generated artifacts. The findings suggest that current software development process models need to evolve to effectively integrate these new technologies and the associated new roles and artifacts. For more details, you can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Integrating Generative AI in Healthcare Software Development: Lessons from a Real-World Project

Integrating Gen AI Across Development Phases

Key Findings and Challenges

Ethical Considerations

Conclusion: A Complementary Role for Gen AI

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates