TLDR: A new research paper introduces a comparative learning framework to make agile story point estimation more efficient. Instead of assigning specific numerical story points, developers compare pairs of tasks, indicating which requires more effort. This approach, based on the ‘law of comparative judgments,’ reduces human cognitive burden. The proposed SBERT-Comparative model, trained on these pairwise comparisons, achieves performance comparable to or better than traditional regression models that rely on direct story point estimates, suggesting a significant reduction in the manual effort required for project planning.
In the world of agile software development, accurately estimating the effort required for tasks, often known as “story points,” is crucial for effective sprint planning and resource allocation. Traditionally, teams rely on collaborative methods like Planning Poker, where developers discuss and assign numerical story points to each backlog item. While these discussions foster understanding and knowledge transfer, the process can become time-consuming and labor-intensive, especially as projects progress.
Even with the advent of machine learning models designed to predict story points, a significant challenge persists: these models typically require extensive historical data from the *same* project to make accurate predictions. This means that for every new software project, human experts are still needed to provide a large volume of initial estimates, limiting the scalability and efficiency benefits of automation.
A New Approach: Comparative Learning
A recent research paper, titled “Efficient Story Point Estimation With Comparative Learning,” proposes an innovative framework to streamline this process. Instead of asking developers to assign a specific story point value to every task, the new method leverages “comparative judgments.” This means developers are presented with pairs of backlog items and simply indicate which item they believe requires more effort. This approach is rooted in the “law of comparative judgments,” which suggests that comparing two items is often a much less cognitively demanding task for humans than assigning an absolute value to each.
The core idea is to replace direct, numerical estimates with these simpler, pairwise comparisons. A machine learning model is then trained on these comparative judgments. For instance, if a developer indicates that “Item A” requires more effort than “Item B,” the model learns this relative relationship. Over time, by processing many such comparisons, the model develops an understanding of the relative effort levels across all tasks.
How It Works
The framework utilizes advanced natural language processing techniques, specifically SBERT (Sentence-BERT) embeddings, to represent the text descriptions of backlog items. These embeddings capture the semantic meaning of the tasks. A machine learning model, referred to as SBERT-Comparative, then uses these text representations and the human-provided comparative judgments to learn a ranking function. Once trained, this model can predict a score for any new backlog item, effectively ranking it against others based on estimated effort, without further human intervention.
Also Read:
- Unlocking LLM Potential: A New Approach to Positional Bias
- Predicting Air Traffic Controller Workload with Graph Neural Networks
Promising Results
The researchers empirically evaluated their technique using a dataset comprising over 23,000 manual estimates across 16 software projects. The results are highly encouraging. The SBERT-Comparative model, trained on these comparative judgments, achieved a Spearman’s rank correlation coefficient of 0.34 on average between its predictions and the actual story points. This performance is notably similar to, and in many cases even better than, that of state-of-the-art regression models like GPT2SP and FastText-SVM, which are trained on direct story point values.
Crucially, because providing comparative judgments is less burdensome than direct estimation, this new framework promises a significant reduction in the human effort required for story point estimation. The study also explored whether more comparative judgments lead to better performance, finding that while some increase can be beneficial, there are diminishing returns, suggesting that even a minimal set of comparisons can yield effective results.
This work represents a significant step towards making agile planning more efficient and less taxing for development teams. By shifting from absolute estimations to relative comparisons, the proposed comparative learning framework offers a practical path to streamline a critical aspect of software development. The code and data used in this study are publicly available, encouraging further research and adoption of this promising approach. You can find more details in the full research paper available at this link.


