Gut Microbiome Wellness Index 2 enhances health status prediction from gut microbiome taxonomic profiles

Abstract:

Recent advancements in translational gut microbiome research have revealed its crucial role in shaping predictive healthcare applications. Herein, we introduce the Gut Microbiome Wellness Index 2 (GMWI2), an enhanced version of our original GMWI prototype, designed as a standardized disease-agnostic health status indicator based on gut microbiome taxonomic profiles.

Our analysis involves pooling existing 8069 stool shotgun metagenomes from 54 published studies across a global demographic landscape (spanning 26 countries and six continents) to identify gut taxonomic signals linked to disease presence or absence. GMWI2 achieves a cross-validation balanced accuracy of 80% in distinguishing healthy (no disease) from non-healthy (diseased) individuals and surpasses 90% accuracy for samples with higher confidence (i.e., outside the “reject option”).

This performance exceeds that of the original GMWI model and traditional species-level α-diversity indices, indicating a more robust gut microbiome signature for differentiating between healthy and non-healthy phenotypes across multiple diseases. When assessed through inter-study validation and external validation cohorts, GMWI2 maintains an average accuracy of nearly 75%.

Furthermore, by reevaluating previously published datasets, GMWI2 offers new insights into the effects of diet, antibiotic exposure, and fecal microbiota transplantation on gut health. Available as an open-source command-line tool, GMWI2 represents a timely, pivotal resource for evaluating health using an individual’s unique gut microbial composition.

Source: Chang, D., Gupta, V.K., Hur, B. et al. Gut Microbiome Wellness Index 2 enhances health status prediction from gut microbiome taxonomic profiles. Nat Commun 15, 7447 (2024). https://doi.org/10.1038/s41467-024-51651-9 https://www.nature.com/articles/s41467-024-51651-9 (Full text)

Integrated weighted gene co-expression network analysis with an application to chronic fatigue syndrome

Abstract:

BACKGROUND: Systems biologic approaches such as Weighted Gene Co-expression Network Analysis (WGCNA) can effectively integrate gene expression and trait data to identify pathways and candidate biomarkers. Here we show that the additional inclusion of genetic marker data allows one to characterize network relationships as causal or reactive in a chronic fatigue syndrome (CFS) data set.

RESULTS: We combine WGCNA with genetic marker data to identify a disease-related pathway and its causal drivers, an analysis which we refer to as “Integrated WGCNA” or IWGCNA. Specifically, we present the following IWGCNA approach: 1) construct a co-expression network, 2) identify trait-related modules within the network, 3) use a trait-related genetic marker to prioritize genes within the module, 4) apply an integrated gene screening strategy to identify candidate genes and 5) carry out causality testing to verify and/or prioritize results. By applying this strategy to a CFS data set consisting of microarray, SNP and clinical trait data, we identify a module of 299 highly correlated genes that is associated with CFS severity. Our integrated gene screening strategy results in 20 candidate genes. We show that our approach yields biologically interesting genes that function in the same pathway and are causal drivers for their parent module. We use a separate data set to replicate findings and use Ingenuity Pathways Analysis software to functionally annotate the candidate gene pathways.

CONCLUSION: We show how WGCNA can be combined with genetic marker data to identify disease-related pathways and the causal drivers within them. The systems genetics approach described here can easily be used to generate testable genetic hypotheses in other complex disease studies.

 

Source: Presson AP, Sobel EM, Papp JC, Suarez CJ, Whistler T, Rajeevan MS, Vernon SD, Horvath S. Integrated weighted gene co-expression network analysis with an application to chronic fatigue syndrome.BMC Syst Biol. 2008 Nov 6;2:95. doi: 10.1186/1752-0509-2-95. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2625353/ (Full article)

 

Interpreter of maladies: redescription mining applied to biomedical data analysis

Abstract:

Comprehensive, systematic and integrated data-centric statistical approaches to disease modeling can provide powerful frameworks for understanding disease etiology. Here, one such computational framework based on redescription mining in both its incarnations, static and dynamic, is discussed.

The static framework provides bioinformatic tools applicable to multifaceted datasets, containing genetic, transcriptomic, proteomic, and clinical data for diseased patients and normal subjects. The dynamic redescription framework provides systems biology tools to model complex sets of regulatory, metabolic and signaling pathways in the initiation and progression of a disease.

As an example, the case of chronic fatigue syndrome (CFS) is considered, which has so far remained intractable and unpredictable in its etiology and nosology. The redescription mining approaches can be applied to the Centers for Disease Control and Prevention’s Wichita (KS, USA) dataset, integrating transcriptomic, epidemiological and clinical data, and can also be used to study how pathways in the hypothalamic-pituitary-adrenal axis affect CFS patients.

 

Source: Waltman P, Pearlman A, Mishra B. Interpreter of maladies: redescription mining applied to biomedical data analysis. Pharmacogenomics. 2006 Apr;7(3):503-9. https://www.ncbi.nlm.nih.gov/pubmed/16610960

 

Analysis of clinical, epidemiologic, and laboratory data on chronic fatigue syndrome

Abstract:

Much of the research conducted on chronic fatigue syndrome (CFS) is exploratory. The researchers’ overall goal is to use clinical, epidemiologic, and laboratory data to provide clues about the etiology of this syndrome. In preparation for this symposium, a review of numerous publications on CFS has indicated that the literature generally does not reflect the application of optimal statistical methods for exploration of data.

Whenever the researchers’ aim is to generate hypotheses, modern methods designed specifically for exploratory data analysis are likely to provide greater insights into any patterns of data than are the traditional approaches to hypothesis testing. In addition, the use of formal methods of data synthesis for ongoing and future research on CFS is a means of strengthening collaborative efforts and of improving the ability of researchers to interpret the evidence available that relates to specific etiologic factors. The inclusion on the research team of experienced biostatisticians, who would oversee the statistical methods and the development of innovative analyses, is recommended.

 

Source: Redmond CK. Analysis of clinical, epidemiologic, and laboratory data on chronic fatigue syndrome. Rev Infect Dis. 1991 Jan-Feb;13 Suppl 1:S90-3. http://www.ncbi.nlm.nih.gov/pubmed/1826967