TLDR: PowerGPT is an AI-powered system that integrates large language models with statistical engines to automate and improve sample size calculations and statistical test selection in clinical trial design. A randomized trial showed it significantly increased task completion rates and accuracy while reducing completion time for both statisticians and non-statisticians, effectively bridging expertise gaps and making complex power analysis more accessible. The system is freely available and already deployed in multiple institutions.
A new AI-powered system called PowerGPT is set to transform how clinical trials are designed, making complex statistical power analysis more accessible and efficient for researchers. Clinical trials are crucial for medical advancements, but accurately determining sample sizes and selecting appropriate statistical methods can be a significant hurdle, especially for those without extensive statistical expertise.
PowerGPT addresses these challenges by integrating large language models (LLMs) with specialized statistical engines. This innovative system automates the selection of statistical tests and the estimation of sample sizes, which are critical steps in ensuring a study is robust enough to detect meaningful effects.
How PowerGPT Works
PowerGPT operates as an agent-based, end-to-end system. Researchers interact with it through a user-friendly interface, describing their study objectives in natural language. The system then interprets these queries, identifies the most suitable statistical methods, and guides the user through the necessary inputs, such as effect sizes and desired power. It provides explanations in plain language, making complex statistical concepts understandable.
Once the parameters are defined, PowerGPT connects with external APIs and statistical engines to perform the calculations. It supports a wide array of statistical tests, including t-tests, ANOVA, z-tests for proportions, Chi-square tests, Cox proportional hazards models, log-rank tests, and various regression and non-parametric methods. The results are then presented in an actionable format, and users can easily explore alternative scenarios by adjusting parameters.
Randomized Evaluation Shows Significant Improvements
To evaluate its effectiveness, a randomized trial was conducted with 36 participants from the University of Pennsylvania and the University of Texas Health Science Center at Houston. Participants included both statisticians and non-statisticians. One group used PowerGPT, while the other relied on traditional methods like textbooks and standard statistical software.
The results were striking: PowerGPT significantly improved task completion rates and accuracy while drastically reducing the time required. For test selection, the PowerGPT group achieved a 99.3% completion rate compared to 88.9% in the reference group, with an accuracy of 95.6% versus 83.6%. For sample size calculation, PowerGPT users had a 99.3% completion rate against 77.8% for the reference group, and an impressive 94.1% accuracy compared to 55.4%.
On average, PowerGPT users completed each question in 4.0 minutes, while the reference group took 9.3 minutes, demonstrating a substantial gain in efficiency.
Bridging the Expertise Gap
One of PowerGPT’s most impactful findings was its ability to bridge the performance gap between statisticians and non-statisticians. In the traditional methods group, non-statisticians performed significantly worse in both completion rates and accuracy. However, with PowerGPT, non-statisticians achieved completion rates and accuracy levels comparable to those with formal statistical training. This highlights PowerGPT’s potential to democratize access to rigorous study planning, especially in settings where statistical expertise is limited.
Also Read:
- Navigating LLM Sensitivity: A New Approach to Testing Response Differences
- Enhancing Tabular Data Quality with AI-Powered Rule and Code Generation
Deployment and Accessibility
PowerGPT is freely available to researchers and institutions at power-gpt.net. It has been piloted at multiple academic institutions and is actively being deployed within Clinical and Translational Science Award (CTSA) programs. The system is built on a cloud-native infrastructure, ensuring scalability and robust performance for concurrent processing, making it suitable for industrial-scale deployment.
This study provides strong evidence that PowerGPT enhances the accuracy, efficiency, and accessibility of statistical power analysis. By integrating AI-driven tools into research workflows, clinical investigators can make more informed methodological choices, ultimately strengthening the quality and reproducibility of biomedical studies.


