A comparison of classification methods for predicting Chronic Fatigue Syndrome based on genetic data

Abstract:

BACKGROUND: In the studies of genomics, it is essential to select a small number of genes that are more significant than the others for the association studies of disease susceptibility. In this work, our goal was to compare computational tools with and without feature selection for predicting chronic fatigue syndrome (CFS) using genetic factors such as single nucleotide polymorphisms (SNPs).

METHODS: We employed the dataset that was original to the previous study by the CDC Chronic Fatigue Syndrome Research Group. To uncover relationships between CFS and SNPs, we applied three classification algorithms including naive Bayes, the support vector machine algorithm, and the C4.5 decision tree algorithm. Furthermore, we utilized feature selection methods to identify a subset of influential SNPs. One was the hybrid feature selection approach combining the chi-squared and information-gain methods. The other was the wrapper-based feature selection method.

RESULTS: The naive Bayes model with the wrapper-based approach performed maximally among predictive models to infer the disease susceptibility dealing with the complex relationship between CFS and SNPs.

CONCLUSION: We demonstrated that our approach is a promising method to assess the associations between CFS and SNPs.

 

Source: Huang LC, Hsu SY, Lin E. A comparison of classification methods for predicting Chronic Fatigue Syndrome based on genetic data. J Transl Med. 2009 Sep 22;7:81. doi: 10.1186/1479-5876-7-81. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2765429/ (Full article)

 

Bayesian biomarker identification based on marker-expression proteomics data

Abstract:

We are studying variable selection in multiple regression models in which molecular markers and/or gene-expression measurements as well as intensity measurements from protein spectra serve as predictors for the outcome variable (i.e., trait or disease state).

Finding genetic biomarkers and searching genetic-epidemiological factors can be formulated as a statistical problem of variable selection, in which, from a large set of candidates, a small number of trait-associated predictors are identified. We illustrate our approach by analyzing the data available for chronic fatigue syndrome (CFS).

CFS is a complex disease from several aspects, e.g., it is difficult to diagnose and difficult to quantify. To identify biomarkers we used microarray data and SELDI-TOF-based proteomics data. We also analyzed genetic marker information for a large number of SNPs for an overlapping set of individuals. The objectives of the analyses were to identify markers specific to fatigue that are also possibly exclusive to CFS. The use of such models can be motivated, for example, by the search for new biomarkers for the diagnosis and prognosis of cancer and measures of response to therapy. Generally, for this we use Bayesian hierarchical modeling and Markov Chain Monte Carlo computation.

 

Source: Bhattacharjee M, Botting CH, Sillanpää MJ. Bayesian biomarker identification based on marker-expression proteomics data. Genomics. 2008 Dec;92(6):384-92. doi: 10.1016/j.ygeno.2008.06.006. Epub 2008 Aug 15. http://www.sciencedirect.com/science/article/pii/S0888754308001420 (Full article)

 

Glucocorticoid receptor polymorphisms and haplotypes associated with chronic fatigue syndrome

Abstract:

Chronic fatigue syndrome (CFS) is a significant public health problem of unknown etiology, the pathophysiology has not been elucidated, and there are no characteristic physical signs or laboratory abnormalities. Some studies have indicated an association of CFS with deregulation of immune functions and hypothalamic-pituitary-adrenal (HPA) axis activity.

In this study, we examined the association of sequence variations in the glucocorticoid receptor gene (NR3C1) with CFS because NR3C1 is a major effector of the HPA axis. There were 137 study participants (40 with CFS, 55 with insufficient symptoms or fatigue, termed as ISF, and 42 non-fatigued controls) who were clinically evaluated and identified from the general population of Wichita, KS. Nine single nucleotide polymorphisms (SNPs) in NR3C1 were tested for association of polymorphisms and haplotypes with CFS.

We observed an association of multiple SNPs with chronic fatigue compared to non-fatigued (NF) subjects (P < 0.05) and found similar associations with quantitative assessments of functional impairment (by the SF-36), with fatigue (by the Multidimensional Fatigue Inventory) and with symptoms (assessed by the Centers for Disease Control Symptom Inventory).

Subjects homozygous for the major allele of all associated SNPs were at increased risk for CFS with odds ratios ranging from 2.61 (CI 1.05-6.45) to 3.00 (CI 1.12-8.05). Five SNPs, covering a region of approximately 80 kb, demonstrated high linkage disequilibrium (LD) in CFS, but LD gradually declined in ISF to NF subjects. Furthermore, haplotype analysis of the region in LD identified two associated haplotypes with opposite alleles: one protective and the other conferring risk of CFS.

These results demonstrate NR3C1 as a potential mediator of chronic fatigue, and implicate variations in the 5′ region of NR3C1 as a possible mechanism through which the alterations in HPA axis regulation and behavioural characteristics of CFS may manifest.

 

Source: Rajeevan MS, Smith AK, Dimulescu I, Unger ER, Vernon SD, Heim C, Reeves WC. Glucocorticoid receptor polymorphisms and haplotypes associated with chronic fatigue syndrome. Genes Brain Behav. 2007 Mar;6(2):167-76. http://onlinelibrary.wiley.com/doi/10.1111/j.1601-183X.2006.00244.x/full (Full article)

 

Combinations of single nucleotide polymorphisms in neuroendocrine effector and receptor genes predict chronic fatigue syndrome

Abstract:

OBJECTIVE: This paper asks whether the presence of chronic fatigue syndrome (CFS) can be more accurately predicted from single nucleotide polymorphism (SNP) profiles than would occur by chance.

METHODS: Specifically, given SNP profiles for 43 CFS patients, together with 58 controls, we used an enumerative search to identify an ensemble of conjunctive rules that predict whether a patient has CFS.

RESULTS: The accuracy of the rules reached 76.3%, with the highest accuracy rules yielding 49 true negatives, 15 false negatives, 28 true positives and nine false positives (odds ratio [OR] 8.94, p < 0.0001). Analysis of the SNPs used most frequently in the overall ensemble of rules gave rise to a list of ‘most important SNPs’, which was not identical to the list of ‘most differentiating SNPs’ that one would calculate via studying each SNP independently. The top three genes containing the SNPs accounting for the highest accumulated importances were neuronal tryptophan hydroxylase (TPH2), catechol-O-methyltransferase (COMT) and nuclear receptor subfamily 3, group C, member 1 glucocorticoid receptor (NR3C1).

CONCLUSION: The fact that only 28 out of several million possible SNPs predict whether a person has CFS with 76% accuracy indicates that CFS has a genetic component that may help to explain some aspects of the illness.

 

Source: Goertzel BN, Pennachin C, de Souza Coelho L, Gurbaxani B, Maloney EM, Jones JF. Combinations of single nucleotide polymorphisms in neuroendocrine effector and receptor genes predict chronic fatigue syndrome. Pharmacogenomics. 2006 Apr;7(3):475-83. https://www.ncbi.nlm.nih.gov/pubmed/16610957