Linear data mining the Wichita clinical matrix suggests sleep and allostatic load involvement in chronic fatigue syndrome

Abstract:

OBJECTIVES: To provide a mathematical introduction to the Wichita (KS, USA) clinical dataset, which is all of the nongenetic data (no microarray or single nucleotide polymorphism data) from the 2-day clinical evaluation, and show the preliminary findings and limitations, of popular, matrix algebra-based data mining techniques.

METHODS: An initial matrix of 440 variables by 227 human subjects was reduced to 183 variables by 164 subjects. Variables were excluded that strongly correlated with chronic fatigue syndrome (CFS) case classification by design (for example, the multidimensional fatigue inventory [MFI] data), that were otherwise self reporting in nature and also tended to correlate strongly with CFS classification, or were sparse or nonvarying between case and control. Subjects were excluded if they did not clearly fall into well-defined CFS classifications, had comorbid depression with melancholic features, or other medical or psychiatric exclusions. The popular data mining techniques, principle components analysis (PCA) and linear discriminant analysis (LDA), were used to determine how well the data separated into groups. Two different feature selection methods helped identify the most discriminating parameters.

RESULTS: Although purely biological features (variables) were found to separate CFS cases from controls, including many allostatic load and sleep-related variables, most parameters were not statistically significant individually. However, biological correlates of CFS, such as heart rate and heart rate variability, require further investigation.

CONCLUSIONS: Feature selection of a limited number of variables from the purely biological dataset produced better separation between groups than a PCA of the entire dataset. Feature selection highlighted the importance of many of the allostatic load variables studied in more detail by Maloney and colleagues in this issue [1] , as well as some sleep-related variables. Nonetheless, matrix linear algebra-based data mining approaches appeared to be of limited utility when compared with more sophisticated nonlinear analyses on richer data types, such as those found in Maloney and colleagues [1] and Goertzel and colleagues [2] in this issue.

 

Source: Gurbaxani BM, Jones JF, Goertzel BN, Maloney EM. Linear data mining the Wichita clinical matrix suggests sleep and allostatic load involvement in chronic fatigue syndrome. Pharmacogenomics. 2006 Apr;7(3):455-65. https://www.ncbi.nlm.nih.gov/pubmed/16610955

 

Gene expression profile exploration of a large dataset on chronic fatigue syndrome

Abstract:

OBJECTIVE: To gain understanding of the molecular basis of chronic fatigue syndrome (CFS) through gene expression analysis using a large microarray data set in conjunction with clinically administrated questionnaires.

METHOD: Data from the Wichita (KS, USA) CFS Surveillance Study was used, comprising 167 participants with two self-report questionnaires (multidimensional fatigue inventory [MFI] and Zung depression scale [Zung]), microarray data, empiric classification, and others. Microarray data was analyzed using bioinformatics tools from ArrayTrack.

RESULTS: Correspondence analysis was applied to the MFI questionnaire to select the 23 samples having either the most or the least fatigue, and to the Zung questionnaire to select the 26 samples having either the most or least depression; ten samples were common, resulting in a total of 39 samples. The MFI and Zung-based CFS/non-CFS (NF) classifications on the 39 samples were consistent with the empiric classification. Two differentially-expressed gene lists were determined, 188 fatigue-related genes and 164 depression-related genes, which shared 24 common genes and involved 11 common pathways. Principal component analysis based on 24 genes clearly separates 39 samples with respect to their likelihood to be CFS. Most of the 24 genes are not previously reported for CFS, yet their functions are consistent with the prevailing model of CFS, such as immune response, apoptosis, ion channel activity, signal transduction, cell-cell signaling, regulation of cell growth and neuronal activity. Hierarchical cluster analysis was performed based on 24 genes to classify 128 (=167-39) unassigned samples. Several of the 11 identified common pathways are supported by earlier findings for CFS, such as cytokine-cytokine receptor interaction and neuroactive ligand-receptor interaction. Importantly, most of the 11 common pathways are interrelated, suggesting complex biological mechanisms associated with CFS.

CONCLUSION: Bioinformatics is critical in this study to select definitive sample groups, analyze gene expression data and gain insight into biological mechanisms. The 24 identified common genes and 11 common pathways could be important in future studies of CFS at the molecular level.

 

Source: Fang H, Xie Q, Boneva R, Fostel J, Perkins R, Tong W. Gene expression profile exploration of a large dataset on chronic fatigue syndrome. Pharmacogenomics. 2006 Apr;7(3):429-40. https://www.ncbi.nlm.nih.gov/pubmed/16610953

 

Exploration of statistical dependence between illness parameters using the entropy correlation coefficient

Abstract:

The entropy correlation coefficient (ECC) is a useful tool for measuring statistical dependence between variables. We employed this tool to search for pairs of variables that correlated in the chronic fatigue syndrome (CFS) Computational Challenge dataset. Highly related variables are candidates for data reduction, and novel relationships could lead to hypotheses regarding the pathogenesis of CFS.

METHODS: Data for 130 female participants in the Wichita (KS, USA) clinical study [1] was coded into numerical values. Metric data was grouped using Gaussian mixture models; the number of groups was chosen using Bayesian information content. The pair-wise correlation between all variables was computed using the ECC. Significance was estimated from 1000 iterations of a permutation test and a threshold of 0.01 was used to identify significantly correlated variables.

RESULTS: The five dimensions of multidimensional fatigue inventory (MFI) were all highly correlated with each other. Seven Short Form (SF)-36 measures, four CFS case-defining symptoms and the Zung self-rating depression scale all correlated with all MFI dimensions. No physiological variables correlate with more than one MFI dimension. MFI, SF-36, CDC symptom inventory, the Zung self-rating depression scale and three Cambridge Neuropsychological Test Automated Battery (CANTAB) measures are highly correlated with CFS disease status.

DISCUSSION: Correlations between the five dimensions of MFI are expected since they are measured from the same instrument. The relationship between MFI and Zung depression index has been previously reported. MFI, SF-36, and Centers for Disease Control and Prevention (CDC) symptom inventory are used to classify CFS; it is not surprising that they are correlated with disease status. Only one of the three CANTAB measures that correlate with disease status has been previously found, indicating the ECC identifies relationships not found with other statistical tools.

CONCLUSION: The ECC is a useful tool for measuring statistical dependence between variables in clinical and laboratory datasets. The ECC needs to be further studied to gain a better understanding of its meaning for clinical data.

 

Source: Craddock RC, Taylor R, Broderick G, Whistler T, Klimas N, Unger ER. Exploration of statistical dependence between illness parameters using the entropy correlation coefficient. Pharmacogenomics. 2006 Apr;7(3):421-8. https://www.ncbi.nlm.nih.gov/pubmed/16610952

 

The validity of an empirical delineation of heterogeneity in chronic unexplained fatigue

Abstract:

OBJECTIVES: To validate a latent class structure derived empirically from a clinical data set obtained from persons with chronic medically unexplained fatigue.

METHODS: The strategies utilized in this validation study included: recalculating latent class analysis (LCA) results varying random seeds and the number of initial random starting sets; recalculating LCA results by substituting alternate variables to demonstrate a robust solution; determining the statistical significance of between-class differences on disability, fatigue and demographic measures omitted from the data set used for LCA; cross-classifying class membership using established Centers for Disease Control and Prevention (CDC) research criteria for chronic fatigue syndrome (CFS) to compare the relative proportions of subjects designated CFS, chronic fatigue (not CFS) or healthy controls captured by the latent classes.

RESULTS: Recalculation of results and substitution of variables for low-loading variables demonstrated a robust LCA result. Highly significant between-class differences were confirmed between Class 2 (well) and those interpreted as ill/fatigued. Analysis of between-class differences for the fatigue groups revealed significant differences for all disability and fatigue variables, but with equivalent levels of reported activity and reduction in motivation. Cross-classification against established CDC criteria demonstrated that 89% of subjects constituting Class 2 (well) were indeed nonfatigued controls. A general tendency for grouping CFS cases in the multiple symptomatic classes was noted.

CONCLUSION: This study established reasonably good validity for an empirically-derived latent class solution reflecting considerable heterogeneity among subjects with medically unexplained chronic fatigue. This work strengthens the growing understanding of CFS as a heterogeneous entity comprised of several conditions with different underlying pathophysiological mechanisms.

 

Source: Aslakson E, Vollmer-Conna U, White PD. The validity of an empirical delineation of heterogeneity in chronic unexplained fatigue. Pharmacogenomics. 2006 Apr;7(3):365-73. https://www.ncbi.nlm.nih.gov/pubmed/16610947

 

The challenge of integrating disparate high-content data: epidemiological, clinical and laboratory data collected during an in-hospital study of chronic fatigue syndrome

Abstract:

Chronic fatigue syndrome (CFS) is a debilitating illness characterized by multiple unexplained symptoms including fatigue, cognitive impairment and pain. People with CFS have no characteristic physical signs or diagnostic laboratory abnormalities, and the etiology and pathophysiology remain unknown. CFS represents a complex illness that includes alterations in homeostatic systems, involves multiple body systems and results from the combined action of many genes, environmental factors and risk-conferring behavior. In order to achieve understanding of complex illnesses, such as CFS, studies must collect relevant epidemiological, clinical and laboratory data and then integrate, analyze and interpret the information so as to obtain meaningful clinical and biological insight. This issue of Pharmacogenomics represents such an approach to CFS.

Data was collected during a 2-day in-hospital study of persons with CFS, other medically and psychiatrically unexplained fatiguing illnesses and nonfatigued controls identified from the general population of Wichita, KS, USA. While in the hospital, the participants’ psychiatric status, sleep characteristics and cognitive functioning was evaluated, and biological samples were collected to measure neuroendocrine status, autonomic nervous system function, systemic cytokines and peripheral blood gene expression. The data generated from these assessments was made available to a multidisciplinary group of 20 investigators from around the world who were challenged with revealing new insight and algorithms for integration of this complex, high-content data and, if possible, identifying molecular markers and elucidating pathophysiology of chronic fatigue. The group was divided into four teams with representation from the disciplines of medicine, mathematics, biology, engineering and computer science. The papers in this issue are the culmination of this 6-month challenge, and demonstrate that data integration and multidisciplinary collaboration can indeed yield novel approaches for handling large, complex datasets, and reveal new insight and relevance to a complex illness such as CFS.

Comment in: The postgenomic era and complex disease. [Pharmacogenomics. 2006]

 

Source: Vernon SD, Reeves WC. The challenge of integrating disparate high-content data: epidemiological, clinical and laboratory data collected during an in-hospital study of chronic fatigue syndrome. Pharmacogenomics. 2006 Apr;7(3):345-54. https://www.ncbi.nlm.nih.gov/pubmed/16610945

 

The Spanish version of the FibroFatigue Scale: validation of a questionnaire for the observer’s assessment of fibromyalgia and chronic fatigue syndrome

Abstract:

OBJECTIVE: To examine some of the psychometric properties of the Spanish version of the FibroFatigue Scale (FFS).

METHODS: FFS was administered to 120 patients diagnosed with fibromyalgia and chronic fatigue syndrome. Internal consistency was evaluated by using Cronbach’s alpha, test-retest reliability with weighted kappa and construct validity by correlations among FFS, the Fibromyalgia Impact Questionnaire (FIQ), the EuroQol 5D (EQ-5D) and the Hospital Anxiety and Depression Scale (HADS). The interrater reliability was tested using analysis of variance with patients and raters as independent factors.

RESULTS: Internal consistency (alpha) was .88, test-retest reliability was .91, and interrater reliability was .93. Significant correlations were obtained between overall FFS and the FIQ (.55, P<.01), the EQ-5D (-.48, P<.01) and the HADS depression subscale (.25, P<.01), but not with the HADS anxiety subscale.

CONCLUSION: These results support the reliability and validity of the data obtained with the Spanish version of the FSS.

 

Source: García-Campayo J, Pascual A, Alda M, Marzo J, Magallon R, Fortes S. The Spanish version of the FibroFatigue Scale: validation of a questionnaire for the observer’s assessment of fibromyalgia and chronic fatigue syndrome. Gen Hosp Psychiatry. 2006 Mar-Apr;28(2):154-60. https://www.ncbi.nlm.nih.gov/pubmed/16516066

 

Use of depression rating scales in chronic fatigue syndrome

Abstract:

OBJECTIVE: The aim of this study was to examine the performance of three commonly used depression rating scales in a hospital sample of patients with chronic fatigue syndrome (CFS).

METHODS: Sixty-one patients with CDC criteria for CFS completed the General Health Questionnaire (GHQ), the Hamilton Depression Scale (HAM-D) and the depression subscale of the Hospital Anxiety and Depression Scale (HADS-D). Current psychiatric status was assessed using the Structured Clinical Interview for DSM-III-R. DISORDERS: Patient version (SCID-P). Receiver operating curves were drawn for each of the depression rating scales.

RESULTS: Thirty-one percent of the patients were depressed according to the SCID-P. Using the standard cut-offs, both GHQ and HAM-D overestimated the number of depressed patients, whilst the HADS-D underestimated the number. The receiver operating curves suggest that the optimum cut-offs for GHQ, HAM-D and HADS-D in this population are 7/8, 13/14 and 8/9, respectively.

CONCLUSIONS: Standard cutoffs may not be appropriate when using depression rating scales in CFS patients in a tertiary care setting.

 

Source: Henderson M, Tannock C. Use of depression rating scales in chronic fatigue syndrome. J Psychosom Res. 2005 Sep;59(3):181-4. http://www.ncbi.nlm.nih.gov/pubmed/16198192

 

The Chronic Fatigue Syndrome Activities and Participation Questionnaire (CFS-APQ): an overview

Abstract:

Chronic fatigue syndrome (CFS) is characterized by severe fatigue and a reduction in activity levels. The purpose of this study was to provide an overview of design, reliability, and validity of the CFS Activities and Participation Questionnaire (CFS-APQ).

The CFS-APQ was constructed based on a retrospective analysis of the Karnofsky Performance Status Questionnaire and the Activities of Daily Living Questionnaire (n = 141). In a reliability study of 34 participants the test-retest reliability coefficient of the CFS-APQ was 0.95. In two different studies, the Cronbach alpha coefficient for internal consistency varied between 0.87 (n = 88) and 0.94 (n = 47). The CFS-APQ was administered to 47 patients who listed 183 activities that had become difficult due to their chronic symptoms, and 157 (85.8%) answers matched the content of the CFS-APQ.

The outcome of a cross-sectional study (n = 88) studying the correlations between the Medical Outcomes Short Form 36 Health Status Survey subscale scores and the CFS-APQ supported the validity of the CFS-APQ. The CFS-APQ scores correlated with a behavioural assessment of the patients’ performance of activities encompassed by the questionnaire (r = 0.29-0.55; n = 63), and correlated with exercise capacity parameters (r = 0.26-0.39; n = 77) obtained during a maximal exercise capacity stress test. Finally, the CFS-APQ correlated with visual analogue scales for pain (r = 0.51) and fatigue (r = 0.50; n = 47).

It is concluded that the CFS-APQ generates reliable and valid data, and can be used as a clinical measure of disease severity in patients with CFS. Future studies should aim at examining the sensitivity of the CFS-APQ.

 

Source: Nijs J, Vaes P, De Meirleir K. The Chronic Fatigue Syndrome Activities and Participation Questionnaire (CFS-APQ): an overview. Occup Ther Int. 2005;12(2):107-21. http://www.ncbi.nlm.nih.gov/pubmed/16136868

 

CFSUM1 and CFSUM2 in urine from patients with chronic fatigue syndrome are methodological artefacts

Abstract:

McGregor et al. reported increased levels of an unidentified urinary compound (CFSUM1) in patients with chronic fatigue syndrome (CFS), with reduced excretion of another unidentified compound (CFSUM2), and suggested the possibility of chemical or metabolic ‘markers’ for CFS. The identity of CFSUM1 as reported was erroneous and the identities of these compounds have remained unknown until now.

Urine samples were obtained from 30 patients with ME/CFS, 30 age- and sex-matched healthy controls, 20 control patients with depression and 22 control patients with rheumatoid arthritis. Samples were prepared using the published methods of McGregor et al. to produce heptafluorobutyryl-isobutyl derivatives of urinary metabolites. Alternative preparations utilised isopropyl, n-butyl and trifluoroacetyl derivatives. These were separated and identified using gas chromatography-mass spectrometry.

CFSUM2 was identified as being partially derivatised [isobutyl ester-mono-heptafluorobutyryl (HFB)] serine. CFSUM1 was identified as partially derivatised pyroglutamic acid, being the isobutyl ester without formation of a HFB derivative.

Both CFSUM1 and CFSUM2 are artefacts of the sample preparation procedure and previously reported quantitative abnormalities of CFSUM1 and CFSUM2 in urine from patients with ME/CFS are also artefactual. Pyroglutamic acid may be of primarily dietary origin. The methods used cannot provide reliable qualitative or quantitative data on urinary metabolites. No clinical or biochemical significance can be drawn between these compounds in ME/CFS or any other clinical conditions.

 

Source: Chalmers RA, Jones MG, Goodwin CS, Amjad S. CFSUM1 and CFSUM2 in urine from patients with chronic fatigue syndrome are methodological artefacts. Clin Chim Acta. 2006 Feb;364(1-2):148-58. Epub 2005 Aug 10. http://www.ncbi.nlm.nih.gov/pubmed/16095585

 

Psychometric properties of the CDC Symptom Inventory for assessment of chronic fatigue syndrome

Abstract:

OBJECTIVES: Validated or standardized self-report questionnaires used in research studies and clinical evaluation of chronic fatigue syndrome(CFS) generally focus on the assessment of fatigue. There are relatively few published questionnaires that evaluate case defining and other accompanying symptoms in CFS. This paper introduces the self-report CDC CFS Symptom Inventory and analyzes its psychometric properties.

METHODS: One hundred sixty-four subjects (with CFS, other fatiguing illnesses and non fatigued controls) identified from the general population of Wichita, Kansas were enrolled. Evaluation included a physical examination, a standardized psychiatric interview, three previously validated self-report questionnaires measuring fatigue and illness impact (Medical Outcomes Survey Short-Form-36 [MOS SF-36], Multidimensional Fatigue Inventory [MFI], Chalder Fatigue Scale), and the CDC CFS Symptom Inventory. Based on theoretical assumptions and statistical analyses, we developed several different Symptom Inventory scores and evaluated them on their ability to differentiate between participants with CFS and non-fatigued controls.

RESULTS: The Symptom Inventory had good internal consistency and excellent convergent validity. A Total score (all symptoms), Case Definition score (CFS case defining symptoms) and Short Form score (6 symptoms with minimal correlation) differentiated CFS cases from controls. Furthermore, both the Case Definition and Short Form scores distinguished people with CFS from fatigued subjects who did not meet criteria for CFS.

CONCLUSION: The Symptom Inventory appears to be a reliable and valid instrument to assess symptoms that accompany CFS. It is a positive addition to existing instruments measuring fatigue because it allows other dimensions of the illness to be assessed. Further research is needed to confirm and replicate the current findings in a normative population.

 

Source: Wagner D, Nisenbaum R, Heim C, Jones JF, Unger ER, Reeves WC. Psychometric properties of the CDC Symptom Inventory for assessment of chronic fatigue syndrome. Popul Health Metr. 2005 Jul 22;3:8. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1183246/ (Full article)