TLDR: The paper introduces Trees Assembling Mann-Whitney (TAMW), a new computational method designed to identify joint associations among numerous low-marginal-effect (LME) genetic variants in complex diseases like Crohn’s disease. TAMW overcomes limitations of existing methods like MDR and LRMW by efficiently analyzing high-dimensional genome-wide data, demonstrating superior power in simulations with multiple interacting LME loci. Its application to Crohn’s disease GWAS data successfully identified significant joint associations and new LME genes, offering a more comprehensive understanding of disease etiology.
Common complex diseases, such as Crohn’s disease, are often influenced by a vast number of genetic variations, many of which have only a small individual impact. These ‘low-marginal-effect’ (LME) genetic variants are challenging to detect using traditional methods, yet their combined interplay is believed to contribute significantly to disease development. Uncovering these hidden genetic connections is crucial for a deeper understanding of disease origins and for advancing personalized medicine.
Existing statistical approaches, like Multifactor Dimensionality Reduction (MDR) and the likelihood-ratio-based Mann-Whitney (LRMW) approach, have made strides in identifying joint genetic associations. However, they face limitations. MDR, while capable of finding interactions, becomes computationally impractical for large-scale genome-wide data due to its exhaustive search nature. LRMW, on the other hand, tends to be conservative and primarily detects associations among a limited number of strong-marginal-effect variants, making it less ideal for the hundreds or thousands of LME variants thought to be involved in complex diseases.
Introducing TAMW: A Powerful New Tool
To address these challenges, researchers have developed a novel and computationally efficient method called Trees Assembling Mann-Whitney (TAMW). This approach is specifically designed to facilitate joint association analysis among a large ensemble of LME genetic variants, even when considering their complex interactions.
At its core, TAMW utilizes a ‘trees-assembling’ technique. Imagine building many small decision trees, each from a different subset of the genetic data. Each tree identifies potential disease-susceptibility genetic variants. TAMW then combines the insights from these numerous trees into a comprehensive model. This ensemble approach allows it to simultaneously consider a vast number of LME genetic variants and their interactions, which might be missed by methods focusing on individual strong effects. The method then uses a Mann-Whitney test to evaluate the overall significance of the joint association identified by this assembled model.
Outperforming Existing Methods
The effectiveness of TAMW was rigorously tested through extensive simulation studies and real-world data applications. In simulations mimicking complex disease scenarios with multiple LME loci and their interactions, TAMW consistently demonstrated superior power compared to MDR and LRMW. For instance, in a simulation involving 20 interacting LME loci, TAMW achieved a power of 0.931, significantly higher than MDR (0.599) and LRMW (0.704). This highlights TAMW’s ability to detect associations that are difficult for other methods to uncover, especially as the complexity of the genetic model increases and individual gene effects become smaller.
Also Read:
- Unlocking Genetic Associations: A New Non-parametric Test for Complex Data
- New AI Model Integrates Incomplete Multi-view Data with Missing Labels
Application to Crohn’s Disease
Beyond simulations, TAMW was applied to a large-scale Wellcome Trust Crohn’s Disease (CD) genome-wide association study (GWAS) dataset, comprising nearly half a million single nucleotide polymorphisms (SNPs). The analysis was completed efficiently, taking approximately 40 hours using parallel computing, demonstrating its capability to handle high-dimensional data. The genome-wide analysis revealed a highly significant joint association predisposing to CD. Further investigation of the top-ranked SNPs identified thirteen genes, including well-known CD-associated genes like NOD2, IL23R, and ATG16L1, as well as six LME genes such as LACC1, ZGPAT, and TNFSF15, which were only moderately associated in single-locus analyses but showed significant joint contributions. These findings suggest that these LME genes, through their interactions, play an important role in CD’s pathophysiological and etiological processes.
This research introduces a powerful new statistical tool that can accelerate the discovery of genetic variants contributing to complex diseases. By effectively identifying joint associations among numerous low-marginal-effect loci, TAMW offers a more comprehensive understanding of disease etiology, paving the way for new insights into disease mechanisms and potential therapeutic targets. For more details, you can refer to the original research paper: Trees Assembling Mann-Whitney Approach for Detecting Genome-wide Joint Association among Low-Marginal-Effect loci.


