TLDR: The Collapsing Receiver Operating Characteristic (CROC) approach is a new statistical method that improves disease risk prediction by effectively integrating both common and rare genetic variants. It extends the FROC approach by using a multistage collapsing procedure to group rare variants into “pseudo-common variants,” making them more amenable to analysis. Evaluations show CROC achieves higher prediction accuracy, especially when rare variants are considered, and outperforms FROC in such scenarios, while also being more computationally efficient.
Predicting an individual’s risk for developing a disease is a crucial aspect of public health and clinical care. While significant strides have been made through genetic research, particularly with common genetic variations identified in large-scale studies, the accuracy of these predictions for clinical use has often fallen short. A key reason for this limitation is that many existing prediction models primarily focus on common genetic variants, overlooking the potential contribution of rare variants.
A new research paper by Changshuai Wei and Qing Lu introduces a novel statistical method called the Collapsing Receiver Operating Characteristic (CROC) approach. This method aims to significantly improve disease risk prediction by comprehensively accounting for both common and rare genetic variants. The paper, titled “Collapsing ROC approach for risk prediction research on both common and rare variants,” highlights the importance of these often-overlooked rare genetic factors in enhancing predictive accuracy.
The Challenge of Rare Variants
For many years, genome-wide association studies (GWAS) have successfully identified numerous common genetic variants linked to various diseases. However, these common variants often explain only a small fraction of a disease’s heritability. Scientists believe that additional genetic factors, including rare variants and complex gene-gene or gene-environment interactions, hold the key to uncovering the “missing heritability.” Rare variants, defined as those with a minor allele frequency (MAF) less than 5%, have historically been difficult to study due to their low occurrence in populations. Yet, emerging research suggests that these rare variants can play a significant role in complex diseases like obesity, schizophrenia, and colorectal cancer.
The challenge lies in developing statistical methods that can effectively integrate these rare variants with known common variants and other clinical risk factors to achieve more accurate disease predictions. Traditional approaches often struggle to incorporate rare variants due to their low frequency, which can lead to them being overlooked in prediction models.
Introducing the CROC Approach
The CROC approach is an innovative extension of a previously developed method known as the Forward Receiver Operating Characteristic (FROC) approach. While FROC was effective for predicting risk based on a large number of common genetic variants, it was not designed to handle the complexities of rare variants. CROC addresses this by incorporating a unique “multistage collapsing procedure.”
This collapsing procedure works by grouping rare variants into “pseudo-common variants.” Essentially, it identifies and combines rare variants that collectively contribute to disease risk, transforming them into a form that the prediction model can more easily analyze. Once these pseudo-common variants are created, the CROC approach then uses the forward selection algorithm, similar to FROC, to identify the best combination of both common and these newly formed pseudo-common variants for an optimal risk prediction model. Because the pseudo-common variants have a higher effective frequency, they are more likely to be selected by the algorithm, leading to improved prediction accuracy.
Evaluation and Promising Results
The researchers evaluated the CROC approach using simulated data from the Genetic Analysis Workshop 17 (GAW17), which included 697 individuals and 533 single-nucleotide polymorphisms (SNPs) across 37 genes, with a significant proportion (400) being rare variants. The study compared the performance of CROC against FROC and also assessed the impact of including rare variants.
The findings were compelling. A prediction model built using both common and rare variants with the CROC approach achieved a higher accuracy (AUC = 0.605) compared to a model built solely on common variants (AUC = 0.585). While CROC and FROC performed similarly when only common variants were considered, CROC significantly outperformed FROC when rare variants were included. Notably, when the analysis focused exclusively on rare variants, CROC still achieved a respectable AUC of 0.603, whereas FROC’s accuracy dropped significantly to 0.524. Furthermore, the CROC approach proved to be more computationally efficient, requiring less time for analysis (1058 seconds) than FROC (1911 seconds).
These results underscore the potential of the CROC approach to leverage the information contained within rare genetic variants, leading to more robust and accurate disease risk predictions, especially in scenarios where rare variants play a more prominent role or when the influence of common variants is less pronounced.
Also Read:
- Unraveling Complex Diseases: A Novel Statistical Approach for Genetic Discovery
- Unlocking Genetic Insights: A New Method for Analyzing Sequencing Data
Future Implications
The development of the CROC approach represents a significant step forward in genetic risk prediction. As next-generation sequencing technologies continue to uncover millions of rare variants, methods like CROC will be essential for translating this vast amount of genetic information into actionable clinical insights. The authors suggest that future research could further enhance prediction accuracy by incorporating environmental risk factors and gene-environment interactions. While the current study focused on candidate genes, the principles of CROC could be adapted for high-dimensional whole-genome sequencing data, though more sophisticated selection algorithms might be needed to handle the immense number of variants.
This research paves the way for more comprehensive and accurate disease risk prediction models, ultimately holding great promise for improving public health and personalized clinical care. You can read the full paper here.


