TLDR: An empirical study analyzed 24,800 open-source prompts from 92 GitHub repositories to investigate current management practices and quality. It revealed significant challenges including inconsistent formatting, widespread internal and external prompt duplication, and frequent readability and spelling issues. The research provides actionable recommendations for developers to standardize practices, improve discoverability, mitigate duplication, and integrate automated quality assessment to enhance the usability and maintainability of prompts within the rapidly evolving promptware ecosystem.
The rise of powerful foundation models, such as GPT and Llama, has ushered in a new era of software development known as promptware. This innovative approach allows individuals with minimal coding or AI experience to build intelligent applications using natural language prompts. A prime example is ChatGPT, which relies on built-in prompts to define its behavior and interactions.
However, as promptware gains traction, the effective management of these prompts—including their organization, versioning, and quality assurance—has become a significant challenge. While specialized prompt stores exist, GitHub has emerged as a popular platform for open-source prompt management due to its collaborative nature.
A recent study, titled Understanding Prompt Management in GitHub Repositories: A Call for Best Practices, delves into how developers currently manage open-source prompts on GitHub and evaluates their quality. Conducted by Hao Li, Hicham Masri, Filipe R. Cogo, Abdul Ali Bangash, Bram Adams, and Ahmed E. Hassan, the research analyzed a substantial dataset of 24,800 prompts from 92 GitHub repositories.
The Challenges of Prompt Management on GitHub
The study highlights several fundamental limitations of GitHub for managing prompt assets. Unlike traditional source code, prompts are often unstructured or semi-structured, creating a mismatch with GitHub’s focus on source files and lines as units of work. Furthermore, GitHub lacks dedicated tools for ensuring prompt quality, similar to the gatekeeping mechanisms available for source code.
Key Findings from the Empirical Analysis
The researchers uncovered critical challenges and patterns in prompt management:
- Diverse Use Cases: Topic analysis revealed that open-source prompts primarily serve marketing-related tasks (e.g., marketing campaign strategies, email marketing), content generation (e.g., summarization, SEO writing), and surprisingly, software engineering tasks like code debugging, translation, and web design guidance.
- Repository Categories: GitHub repositories storing prompts were categorized into three types: Prompt Collections (72.8%), which store large sets of prompts; Prompt Applications (21.7%), containing prompts for specific applications; and Prompt Courseware (5.4%), serving as educational resources.
- Storage Formats and Organization: Markdown is the most popular format (72.8%), followed by TXT (16.3%). Repositories showed mixed preferences for single-prompt files (each file contains one prompt) versus multi-prompt files (each file contains multiple prompts), with application repositories favoring single-prompt files for modularity.
- Uneven Distribution: A significant finding was the highly skewed distribution of prompts, with just 8.7% of repositories containing over 90% of all collected prompts. The six largest repositories, all prompt collections, accounted for 88.8% of the data.
- Prompt Duplication: Duplication is a widespread issue, affecting 23.9% of repositories. Overall, 10.1% of analyzed prompts were identical duplicates, occurring both within (internal duplication) and across (external duplication) repositories. This leads to maintenance inefficiencies and potential error propagation.
- Prompt Quality Issues: The study assessed prompt quality based on length, readability, and syntax correctness.
- Length: Most prompts are short, with about 75% having fewer than 92 words. However, prompts in application repositories are significantly longer, often functioning as natural language programs.
- Readability: A majority of prompts (80.1%) were found to be relatively difficult to read, with low Flesch Reading Ease (FRE) scores. This can hinder reuse, adaptation, and even confuse foundation models.
- Spelling Errors: More than half (55.2%) of GitHub prompts contained at least one spelling mistake. Application repositories had the highest prevalence of errors (96.7%), likely due to the difficulty of maintaining longer prompts without adequate quality assurance.
Also Read:
- AI-Powered Solutions for Flexible Automotive Architectures
- Unpacking LLM Failures in Embedded Machine Learning Code Generation
Actionable Recommendations for Best Practices
Based on these findings, the researchers provide several recommendations to enhance the usability and maintainability of open-source prompts:
- Standardize Formats and Organization: Establish clear guidelines and standardized formats for prompt management, emphasizing machine-readable metadata (e.g., authorship, use-cases) and human-readable documentation.
- Improve Discoverability and Reuse: Encourage developers to use structured directories or file formats (like CSV) to categorize prompts based on their use cases, making them easier to find and reuse.
- Mitigate Prompt Duplication: Integrate automated duplicate detection tools into workflows and regularly audit prompts to reduce redundancy and document their origins.
- Integrate Automated Quality Assessment: Implement continuous integration/continuous deployment (CI/CD) like pipelines for prompts, using automated tools for readability assessment, spell-checking, and metadata validation to ensure consistent quality.
These recommendations aim to address the current chaos in prompt management, fostering a more robust and sustainable promptware ecosystem for developers and users alike.


