Unsupervised cluster analysis reveals distinct subtypes of ME/CFS patients based on peak oxygen consumption and SF-36 scores

Abstract:

Purpose: Myalgic encephalomyelitis, commonly referred to as chronic fatigue syndrome (ME/CFS), is a severe, disabling chronic disease and an objective assessment of prognosis is crucial to evaluate the efficacy of future drugs. Attempts are ongoing to find a biomarker to objectively assess the health status of (ME/CFS), patients. This study therefore aims to demonstrate that oxygen consumption is a biomarker of ME/CFS provides a method to classify patients diagnosed with ME/CFS based on their responses to the Short Form-36 (SF-36) questionnaire, which can predict oxygen consumption using cardiopulmonary exercise testing (CPET).

Methods: Two datasets were used in the study. The first contained SF-36 responses from 2,347 validated records of ME/CFS diagnosed participants, and an unsupervised machine learning model was developed to cluster the data. The second dataset was used as a validation set and included the cardiopulmonary exercise test (CPET) results of 239 participants diagnosed with ME/CFS. Participants from this dataset were grouped by peak oxygen consumption according to Weber’s classification. The SF-36 questionnaire was correctly completed by only 92 patients, who were clustered using the machine learning model. Two categorical variables were then entered into a contingency table: the cluster with values {0,1} and Weber classification {A, B, C, D} were assigned. Finally, the Chi-square test of independence was used to assess the statistical significance of the relationship between the two parameters.

Findings: The results indicate that the Weber classification is directly linked to the score on the SF-36 questionnaire. Furthermore, the 36-response matrix in the machine learning model was shown to give more reliable results than the subscale matrix (p – value < 0.05) for classifying patients with ME/CFS.

Implications: Low oxygen consumption on CPET can be considered a biomarker in patients with ME/CFS. Our analysis showed a close relationship between the cluster based on their SF-36 questionnaire score and the Weber classification, which was based on peak oxygen consumption during CPET. The dataset for the training model comprised raw responses from the SF-36 questionnaire, which is proven to better preserve the original information, thus improving the quality of the model.

Source: Lacasa M, Launois P, Prados F, Alegre J, Casas-Roma J. Unsupervised cluster analysis reveals distinct subtypes of ME/CFS patients based on peak oxygen consumption and SF-36 scores. Clin Ther. 2023 Oct 4:S0149-2918(23)00352-1. doi: 10.1016/j.clinthera.2023.09.007. Epub ahead of print. PMID: 37802746. https://www.clinicaltherapeutics.com/article/S0149-2918(23)00352-1/fulltext (Full text)

A retrospective cohort analysis leveraging augmented intelligence to characterize long COVID in the electronic health record: A precision medicine framework

Abstract:

Physical and psychological symptoms lasting months following an acute COVID-19 infection are now recognized as post-acute sequelae of COVID-19 (PASC). Accurate tools for identifying such patients could enhance screening capabilities for the recruitment for clinical trials, improve the reliability of disease estimates, and allow for more accurate downstream cohort analysis.

In this retrospective cohort study, we analyzed the EHR of hospitalized COVID-19 patients across three healthcare systems to develop a pipeline for better identifying patients with persistent PASC symptoms (dyspnea, fatigue, or joint pain) after their SARS-CoV-2 infection. We implemented distributed representation learning powered by the Machine Learning for modeling Health Outcomes (MLHO) to identify novel EHR features that could suggest PASC symptoms outside of typical diagnosis codes. MLHO applies an entropy-based feature selection and boosting algorithms for representation mining. These improved definitions were then used for estimating PASC among hospitalized patients.

30,422 hospitalized patients were diagnosed with COVID-19 across three healthcare systems between March 13, 2020 and February 28, 2021. The mean age of the population was 62.3 years (SD, 21.0 years) and 15,124 (49.7%) were female.

We implemented the distributed representation learning technique to augment PASC definitions. These definitions were found to have positive predictive values of 0.73, 0.74, and 0.91 for dyspnea, fatigue, and joint pain, respectively.

We estimated that 25 percent (CI 95%: 6-48), 11 percent (CI 95%: 6-15), and 13 percent (CI 95%: 8-17) of hospitalized COVID-19 patients will have dyspnea, fatigue, and joint pain, respectively, 3 months or longer after a COVID-19 diagnosis. We present a validated framework for screening and identifying patients with PASC in the EHR and then use the tool to estimate its prevalence among hospitalized COVID-19 patients.

Source: Strasser ZH, Dagliati A, Shakeri Hossein Abad Z, Klann JG, Wagholikar KB, Mesa R, Visweswaran S, Morris M, Luo Y, Henderson DW, Samayamuthu MJ; Consortium for Clinical Characterization of COVID-19 by EHR (4CE); Omenn GS, Xia Z, Holmes JH, Estiri H, Murphy SN. A retrospective cohort analysis leveraging augmented intelligence to characterize long COVID in the electronic health record: A precision medicine framework. PLOS Digit Health. 2023 Jul 25;2(7):e0000301. doi: 10.1371/journal.pdig.0000301. PMID: 37490472; PMCID: PMC10368277. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10368277/ (Full text)

A Proposed Explainable Artificial Intelligence-Based Machine Learning Model for Discriminative Metabolites for Myalgic Encephalomyelitis/Chronic Fatigue Syndrome

Abstract:

Background: Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a complex and debilitating disease with a significant global prevalence of over 65 million individuals. It affects various systems, including the immune, neurological, gastrointestinal, and circulatory systems. Studies have shown abnormalities in immune cell types, increased inflammatory cytokines, and brain abnormalities. Further research is needed to identify consistent biomarkers and develop targeted therapies. A multidisciplinary approach is essential for diagnosing, treating, and managing this complex disease.

The current study aims at employing explainable artificial intelligence (XAI) and machine learning (ML) techniques to identify discriminative metabolites for ME/CFS.

Material and Methods: The present study used a metabolomics dataset of CFS patients and healthy controls, including 26 healthy controls and 26 ME/CFS patients aged 22-72. The dataset encapsulated 768 metabolites, classified into nine metabolic super-pathways: amino acids, carbohydrates, cofactors, vitamins, energy, lipids, nucleotides, peptides, and xenobiotics.

Random forest-based feature selection and Bayesian Approach based-hyperparameter optimization were implemented on the target data. Four different ML algorithms [Gaussian Naive Bayes (GNB), Gradient Boosting Classifier (GBC), Logistic regression (LR) and Random Forest Classifier (RFC)] were used to classify individuals as ME/CFS patients and healthy individuals. XAI approaches were applied to clinically explain the prediction decisions of the optimum model. Performance evaluation was performed using the indices of accuracy, precision, recall, F1 score, Brier score, and AUC.

Results: The metabolomics of C-glycosyltryptophan, oleoylcholine, cortisone, and 3-hydroxydecanoate were determined to be crucial for ME/CFS diagnosis.

The RFC learning model outperformed GNB, GBC, and LR in ME/CFS prediction using the 1000 iteration bootstrapping method, achieving 98% accuracy, precision, recall, F1 score, 0.01 Brier score, and 99% AUC.

Conclusion: RFC model proposed in this study correctly classified and evaluated ME/CFS patients through the selected biomarker candidate metabolites. The methodology combining ML and XAI can provide a clear interpretation of risk estimation for ME/CFS, helping physicians intuitively understand the impact of key metabolomics features in the model.

Source: Yagin, F.H., Alkhateeb, A., Raza, A., Samee, N.A., Mahmoud, N.F., Colak, C., & Yagin, B. (2023). A Proposed Explainable Artificial Intelligence-Based Machine Learning Model for Discriminative Metabolites for Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. Preprints. https://doi.org/10.20944/preprints202307.1585.v1 https://www.preprints.org/manuscript/202307.1585/v1 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10706650/ (Full text of completed study)

De-black-boxing health AI: demonstrating reproducible machine learning computable phenotypes using the N3C-RECOVER Long COVID model in the All of Us data repository

Abstract:

Machine learning (ML)-driven computable phenotypes are among the most challenging to share and reproduce. Despite this difficulty, the urgent public health considerations around Long COVID make it especially important to ensure the rigor and reproducibility of Long COVID phenotyping algorithms such that they can be made available to a broad audience of researchers. As part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, researchers with the National COVID Cohort Collaborative (N3C) devised and trained an ML-based phenotype to identify patients highly probable to have Long COVID. Supported by RECOVER, N3C and NIH’s All of Us study partnered to reproduce the output of N3C’s trained model in the All of Us data enclave, demonstrating model extensibility in multiple environments. This case study in ML-based phenotype reuse illustrates how open-source software best practices and cross-site collaboration can de-black-box phenotyping algorithms, prevent unnecessary rework, and promote open science in informatics.

Source: Pfaff ER, Girvin AT, Crosskey M, Gangireddy S, Master H, Wei WQ, Kerchberger VE, Weiner M, Harris PA, Basford M, Lunt C, Chute CG, Moffitt RA, Haendel M; N3C and RECOVER Consortia. De-black-boxing health AI: demonstrating reproducible machine learning computable phenotypes using the N3C-RECOVER Long COVID model in the All of Us data repository. J Am Med Inform Assoc. 2023 May 22:ocad077. doi: 10.1093/jamia/ocad077. Epub ahead of print. PMID: 37218289. https://pubmed.ncbi.nlm.nih.gov/37218289/

Proteomics and cytokine analyses distinguish myalgic encephalomyelitis/chronic fatigue syndrome cases from controls

Abstract:

Background: Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a complex, heterogenous disease characterized by unexplained persistent fatigue and other features including cognitive impairment, myalgias, post-exertional malaise, and immune system dysfunction. Cytokines are present in plasma and encapsulated in extracellular vesicles (EVs), but there have been only a few reports of EV characteristics and cargo in ME/CFS. Several small studies have previously described plasma proteins or protein pathways that are associated with ME/CFS.

Methods: We prepared extracellular vesicles (EVs) from frozen plasma samples from a cohort of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) cases and controls with prior published plasma cytokine and plasma proteomics data. The cytokine content of the plasma-derived extracellular vesicles was determined by a multiplex assay and differences between patients and controls were assessed. We then performed multi-omic statistical analyses that considered not only this new data, but extensive clinical data describing the health of the subjects.

Results: ME/CFS cases exhibited greater size and concentration of EVs in plasma. Assays of cytokine content in EVs revealed IL2 was significantly higher in cases. We observed numerous correlations among EV cytokines, among plasma cytokines, and among plasma proteins from mass spectrometry proteomics. Significant correlations between clinical data and protein levels suggest roles of particular proteins and pathways in the disease. For example, higher levels of the pro-inflammatory cytokines Granulocyte-Monocyte Colony-Stimulating Factor (CSF2) and Tumor Necrosis Factor (TNFα) were correlated with greater physical and fatigue symptoms in ME/CFS cases. Higher serine protease SERPINA5, which is involved in hemostasis, was correlated with higher SF-36 general health scores in ME/CFS. Machine learning classifiers were able to identify a list of 20 proteins that could discriminate between cases and controls, with XGBoost providing the best classification with 86.1% accuracy and a cross-validated AUROC value of 0.947. Random Forest distinguished cases from controls with 79.1% accuracy and an AUROC value of 0.891 using only 7 proteins.

Conclusions: These findings add to the substantial number of objective differences in biomolecules that have been identified in individuals with ME/CFS. The observed correlations of proteins important in immune responses and hemostasis with clinical data further implicates a disturbance of these functions in ME/CFS.

Source: Giloteaux L, Li J, Hornig M, Lipkin WI, Ruppert D, Hanson MR. Proteomics and cytokine analyses distinguish myalgic encephalomyelitis/chronic fatigue syndrome cases from controls. J Transl Med. 2023 May 13;21(1):322. doi: 10.1186/s12967-023-04179-3. PMID: 37179299. https://translational-medicine.biomedcentral.com/articles/10.1186/s12967-023-04179-3 (Full text)

Developing a blood cell-based diagnostic test for myalgic encephalomyelitis/chronic fatigue syndrome using peripheral blood mononuclear cells

Abstract:

A blood-based diagnostic test for myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and multiple sclerosis (MS) would be of great value in both conditions, facilitating more accurate and earlier diagnosis, helping with current treatment delivery, and supporting the development of new therapeutics.

Here we use Raman micro-spectroscopy to examine differences between the spectral profiles of blood cells of ME/CFS, MS and healthy controls.

We were able to discriminate the three groups using ensemble classification models with high levels of accuracy (91%) with the additional ability to distinguish mild, moderate, and severe ME/CFS patients from each other (84%).

To our knowledge, this is the first research using Raman micro-spectroscopy to discriminate specific subgroups of ME/CFS patients on the basis of their symptom severity. Specific Raman peaks linked with the different disease types with the potential in further investigations to provide insights into biological changes associated with the different conditions.

Source: Jiabao Xu, Tiffany Lodge,  Caroline Claire Kingdon, James W L Strong, John Maclennan, Eliana Lacerda, Slawomir Kujawski, Pawel Zalewski, Wei Huang, Karl J. Morten. Developing a blood cell-based diagnostic test for myalgic encephalomyelitis/chronic fatigue syndrome using peripheral blood mononuclear cells. medRxiv [Preprint] medRxiv 2023.03.18.23286575; doi: https://doi.org/10.1101/2023.03.18.23286575 https://www.medrxiv.org/content/10.1101/2023.03.18.23286575v1.full-text (Full text)

Investigating brain cortical activity in patients with post-COVID-19 brain fog

Abstract:

Brain fog is a kind of mental problem, similar to chronic fatigue syndrome, and appears about 3 months after the infection with COVID-19 and lasts up to 9 months. The maximum magnitude of the third wave of COVID-19 in Poland was in April 2021.

The research referred here aimed at carrying out the investigation comprising the electrophysiological analysis of the patients who suffered from COVID-19 and had symptoms of brain fog (sub-cohort A), suffered from COVID-19 and did not have symptoms of brain fog (sub-cohort B), and the control group that had no COVID-19 and no symptoms (sub-cohort C). The aim of this article was to examine whether there are differences in the brain cortical activity of these three sub-cohorts and, if possible differentiate and classify them using the machine-learning tools. The dense array electroencephalographic amplifier with 256 electrodes was used for recordings.

The event-related potentials were chosen as we expected to find the differences in the patients’ responses to three different mental tasks arranged in the experiments commonly known in experimental psychology: face recognition, digit span, and task switching. These potentials were plotted for all three patients’ sub-cohorts and all three experiments. The cross-correlation method was used to find differences, and, in fact, such differences manifested themselves in the shape of event-related potentials on the cognitive electrodes.

The discussion of such differences will be presented; however, an explanation of such differences would require the recruitment of a much larger cohort. In the classification problem, the avalanche analysis for feature extractions from the resting state signal and linear discriminant analysis for classification were used. The differences between sub-cohorts in such signals were expected to be found. Machine-learning tools were used, as finding the differences with eyes seemed impossible. Indeed, the A&B vs. C, B&C vs. A, A vs. B, A vs. C, and B vs. C classification tasks were performed, and the efficiency of around 60-70% was achieved.

In future, probably there will be pandemics again due to the imbalance in the natural environment, resulting in the decreasing number of species, temperature increase, and climate change-generated migrations. The research can help to predict brain fog after the COVID-19 recovery and prepare the patients for better convalescence. Shortening the time of brain fog recovery will be beneficial not only for the patients but also for social conditions.

Source: Wojcik GM, Shriki O, Kwasniewicz L, Kawiak A, Ben-Horin Y, Furman S, Wróbel K, Bartosik B, Panas E. Investigating brain cortical activity in patients with post-COVID-19 brain fog. Front Neurosci. 2023 Feb 9;17:1019778. doi: 10.3389/fnins.2023.1019778. PMID: 36845422; PMCID: PMC9947499. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9947499/ (Full text)

Organ and cell-specific biomarkers of Long-COVID identified with targeted proteomics and machine learning

Abstract:

Background: Survivors of acute COVID-19 often suffer prolonged, diffuse symptoms post-infection, referred to as “Long-COVID”. A lack of Long-COVID biomarkers and pathophysiological mechanisms limits effective diagnosis, treatment and disease surveillance. We performed targeted proteomics and machine learning analyses to identify novel blood biomarkers of Long-COVID.

Methods: A case-control study comparing the expression of 2925 unique blood proteins in Long-COVID outpatients versus COVID-19 inpatients and healthy control subjects. Targeted proteomics was accomplished with proximity extension assays, and machine learning was used to identify the most important proteins for identifying Long-COVID patients. Organ system and cell type expression patterns were identified with Natural Language Processing (NLP) of the UniProt Knowledgebase.

Results: Machine learning analysis identified 119 relevant proteins for differentiating Long-COVID outpatients (Bonferonni corrected P < 0.01). Protein combinations were narrowed down to two optimal models, with nine and five proteins each, and with both having excellent sensitivity and specificity for Long-COVID status (AUC = 1.00, F1 = 1.00). NLP expression analysis highlighted the diffuse organ system involvement in Long-COVID, as well as the involved cell types, including leukocytes and platelets, as key components associated with Long-COVID.

Conclusions: Proteomic analysis of plasma from Long-COVID patients identified 119 highly relevant proteins and two optimal models with nine and five proteins, respectively. The identified proteins reflected widespread organ and cell type expression. Optimal protein models, as well as individual proteins, hold the potential for accurate diagnosis of Long-COVID and targeted therapeutics.

Source: Patel MA, Knauer MJ, Nicholson M, Daley M, Van Nynatten LR, Cepinskas G, Fraser DD. Organ and cell-specific biomarkers of Long-COVID identified with targeted proteomics and machine learning. Mol Med. 2023 Feb 21;29(1):26. doi: 10.1186/s10020-023-00610-z. PMID: 36809921; PMCID: PMC9942653. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9942653/ (Full text)

A machine learning approach identifies distinct early-symptom cluster phenotypes which correlate with hospitalization, failure to return to activities, and prolonged COVID-19 symptoms

Abstract:

Background: Accurate COVID-19 prognosis is a critical aspect of acute and long-term clinical management. We identified discrete clusters of early stage-symptoms which may delineate groups with distinct disease severity phenotypes, including risk of developing long-term symptoms and associated inflammatory profiles.

Methods: 1,273 SARS-CoV-2 positive U.S. Military Health System beneficiaries with quantitative symptom scores (FLU-PRO Plus) were included in this analysis. We employed machine-learning approaches to identify symptom clusters and compared risk of hospitalization, long-term symptoms, as well as peak CRP and IL-6 concentrations.

Results: We identified three distinct clusters of participants based on their FLU-PRO Plus symptoms: cluster 1 (“Nasal cluster”) is highly correlated with reporting runny/stuffy nose and sneezing, cluster 2 (“Sensory cluster”) is highly correlated with loss of smell or taste, and cluster 3 (“Respiratory/Systemic cluster”) is highly correlated with the respiratory (cough, trouble breathing, among others) and systemic (body aches, chills, among others) domain symptoms. Participants in the Respiratory/Systemic cluster were twice as likely as those in the Nasal cluster to have been hospitalized, and 1.5 times as likely to report that they had not returned-to-activities, which remained significant after controlling for confounding covariates (P < 0.01). Respiratory/Systemic and Sensory clusters were more likely to have symptoms at six-months post-symptom-onset (P = 0.03). We observed higher peak CRP and IL-6 in the Respiratory/Systemic cluster (P < 0.01).

Conclusions: We identified early symptom profiles potentially associated with hospitalization, return-to-activities, long-term symptoms, and inflammatory profiles. These findings may assist in patient prognosis, including prediction of long COVID risk.

Source: Epsi NJ, Powers JH, Lindholm DA, Mende K, Malloy A, Ganesan A, Huprikar N, Lalani T, Smith A, Mody RM, Jones MU, Bazan SE, Colombo RE, Colombo CJ, Ewers EC, Larson DT, Berjohn CM, Maldonado CJ, Blair PW, Chenoweth J, Saunders DL, Livezey J, Maves RC, Sanchez Edwards M, Rozman JS, Simons MP, Tribble DR, Agan BK, Burgess TH, Pollett SD; EPICC COVID-19 Cohort Study Group. A machine learning approach identifies distinct early-symptom cluster phenotypes which correlate with hospitalization, failure to return to activities, and prolonged COVID-19 symptoms. PLoS One. 2023 Feb 9;18(2):e0281272. doi: 10.1371/journal.pone.0281272. PMID: 36757946; PMCID: PMC9910657. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9910657/ (Full text)

Generalisable long COVID subtypes: Findings from the NIH N3C and RECOVER programmes

Abstract:

Background: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested.

Methods: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning.

Findings: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems.

Interpretation: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.

Source: Reese JT, Blau H, Casiraghi E, Bergquist T, Loomba JJ, Callahan TJ, Laraway B, Antonescu C, Coleman B, Gargano M, Wilkins KJ, Cappelletti L, Fontana T, Ammar N, Antony B, Murali TM, Caufield JH, Karlebach G, McMurry JA, Williams A, Moffitt R, Banerjee J, Solomonides AE, Davis H, Kostka K, Valentini G, Sahner D, Chute CG, Madlock-Brown C, Haendel MA, Robinson PN; N3C Consortium; RECOVER Consortium. Generalisable long COVID subtypes: Findings from the NIH N3C and RECOVER programmes. EBioMedicine. 2022 Dec 21;87:104413. doi: 10.1016/j.ebiom.2022.104413. Epub ahead of print. PMID: 36563487; PMCID: PMC9769411. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9769411/ (Full text)