A machine learning approach identifies distinct early-symptom cluster phenotypes which correlate with hospitalization, failure to return to activities, and prolonged COVID-19 symptoms

Abstract:

Background: Accurate COVID-19 prognosis is a critical aspect of acute and long-term clinical management. We identified discrete clusters of early stage-symptoms which may delineate groups with distinct disease severity phenotypes, including risk of developing long-term symptoms and associated inflammatory profiles.

Methods: 1,273 SARS-CoV-2 positive U.S. Military Health System beneficiaries with quantitative symptom scores (FLU-PRO Plus) were included in this analysis. We employed machine-learning approaches to identify symptom clusters and compared risk of hospitalization, long-term symptoms, as well as peak CRP and IL-6 concentrations.

Results: We identified three distinct clusters of participants based on their FLU-PRO Plus symptoms: cluster 1 (“Nasal cluster”) is highly correlated with reporting runny/stuffy nose and sneezing, cluster 2 (“Sensory cluster”) is highly correlated with loss of smell or taste, and cluster 3 (“Respiratory/Systemic cluster”) is highly correlated with the respiratory (cough, trouble breathing, among others) and systemic (body aches, chills, among others) domain symptoms. Participants in the Respiratory/Systemic cluster were twice as likely as those in the Nasal cluster to have been hospitalized, and 1.5 times as likely to report that they had not returned-to-activities, which remained significant after controlling for confounding covariates (P < 0.01). Respiratory/Systemic and Sensory clusters were more likely to have symptoms at six-months post-symptom-onset (P = 0.03). We observed higher peak CRP and IL-6 in the Respiratory/Systemic cluster (P < 0.01).

Conclusions: We identified early symptom profiles potentially associated with hospitalization, return-to-activities, long-term symptoms, and inflammatory profiles. These findings may assist in patient prognosis, including prediction of long COVID risk.

Source: Epsi NJ, Powers JH, Lindholm DA, Mende K, Malloy A, Ganesan A, Huprikar N, Lalani T, Smith A, Mody RM, Jones MU, Bazan SE, Colombo RE, Colombo CJ, Ewers EC, Larson DT, Berjohn CM, Maldonado CJ, Blair PW, Chenoweth J, Saunders DL, Livezey J, Maves RC, Sanchez Edwards M, Rozman JS, Simons MP, Tribble DR, Agan BK, Burgess TH, Pollett SD; EPICC COVID-19 Cohort Study Group. A machine learning approach identifies distinct early-symptom cluster phenotypes which correlate with hospitalization, failure to return to activities, and prolonged COVID-19 symptoms. PLoS One. 2023 Feb 9;18(2):e0281272. doi: 10.1371/journal.pone.0281272. PMID: 36757946; PMCID: PMC9910657. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9910657/ (Full text)

Generalisable long COVID subtypes: Findings from the NIH N3C and RECOVER programmes

Abstract:

Background: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested.

Methods: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning.

Findings: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems.

Interpretation: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.

Source: Reese JT, Blau H, Casiraghi E, Bergquist T, Loomba JJ, Callahan TJ, Laraway B, Antonescu C, Coleman B, Gargano M, Wilkins KJ, Cappelletti L, Fontana T, Ammar N, Antony B, Murali TM, Caufield JH, Karlebach G, McMurry JA, Williams A, Moffitt R, Banerjee J, Solomonides AE, Davis H, Kostka K, Valentini G, Sahner D, Chute CG, Madlock-Brown C, Haendel MA, Robinson PN; N3C Consortium; RECOVER Consortium. Generalisable long COVID subtypes: Findings from the NIH N3C and RECOVER programmes. EBioMedicine. 2022 Dec 21;87:104413. doi: 10.1016/j.ebiom.2022.104413. Epub ahead of print. PMID: 36563487; PMCID: PMC9769411. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9769411/ (Full text)

Doctors’ attitudes toward specific medical conditions

Abstract:

This study uses machine learning and natural language processing tools to examine the language used by healthcare professionals on a global online forum. It contributes to an underdeveloped area of knowledge, that of physician attitudes toward their patients. Using comments left by physicians on Reddit’s ”Medicine” subreddit (r/medicine), we test if the language from online discussions can reveal doctors’ attitudes toward specific medical conditions. We focus on a set of chronic conditions that usually are more stigmatized and compare them to ones well accepted by the medical community.

We discovered that when comparing diseases with similar traits, doctors discussed some conditions with more negative attitudes. These results show bias does not occur only along the dimensions traditionally analyzed in the economics literature of gender and race, but also along the dimension of disease type. This is meaningful because the emotions associated with beliefs impact physicians’ decision making, prescribing behavior, and quality of care. First, we run a binomial LASSO-logistic regression to compare a range of 21 diseases against myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS), depression, and the autoimmune diseases multiple sclerosis and rheumatoid arthritis.

Next, we use dictionary methods to compare five more chronic diseases: Lyme disease, Ehlers-Danlos syndrome (EDS), Alzheimer’s disease, osteoporosis, and lupus. The results show physicians discuss ME/CFS, depression, and Lyme disease with more negative language than the other diseases in the set. The results for ME/CFS included over four times more negative words than the results for depression.

Source: Brooke Scoles, Catia Nicodemo. Doctors’ attitudes toward specific medical conditions. Journal of Economic Behavior & Organization, Volume 204, December 2022, Pages 182-199. https://www.sciencedirect.com/science/article/pii/S016726812200347X (Full text)

Elevated vascular transformation blood biomarkers in Long-COVID indicate angiogenesis as a key pathophysiological mechanism

Abstract:

Background: Long-COVID is characterized by prolonged, diffuse symptoms months after acute COVID-19. Accurate diagnosis and targeted therapies for Long-COVID are lacking. We investigated vascular transformation biomarkers in Long-COVID patients.

Methods: A case–control study utilizing Long-COVID patients, one to six months (median 98.5 days) post-infection, with multiplex immunoassay measurement of sixteen blood biomarkers of vascular transformation, including ANG-1, P-SEL, MMP-1, VE-Cad, Syn-1, Endoglin, PECAM-1, VEGF-A, ICAM-1, VLA-4, E-SEL, thrombomodulin, VEGF-R2, VEGF-R3, VCAM-1 and VEGF-D.

Results: Fourteen vasculature transformation blood biomarkers were significantly elevated in Long-COVID outpatients, versus acutely ill COVID-19 inpatients and healthy controls subjects (P < 0.05). A unique two biomarker profile consisting of ANG-1/P-SEL was developed with machine learning, providing a classification accuracy for Long-COVID status of 96%. Individually, ANG-1 and P-SEL had excellent sensitivity and specificity for Long-COVID status (AUC = 1.00, P < 0.0001; validated in a secondary cohort). Specific to Long-COVID, ANG-1 levels were associated with female sex and a lack of disease interventions at follow-up (P < 0.05).

Conclusions: Long-COVID patients suffer prolonged, diffuse symptoms and poorer health. Vascular transformation blood biomarkers were significantly elevated in Long-COVID, with angiogenesis markers (ANG-1/P-SEL) providing classification accuracy of 96%. Vascular transformation blood biomarkers hold potential for diagnostics, and modulators of angiogenesis may have therapeutic efficacy.

Source: Patel, M.A., Knauer, M.J., Nicholson, M. et al. Elevated vascular transformation blood biomarkers in Long-COVID indicate angiogenesis as a key pathophysiological mechanism. Mol Med 28, 122 (2022). https://doi.org/10.1186/s10020-022-00548-8 https://molmed.biomedcentral.com/articles/10.1186/s10020-022-00548-8 (Full text)

Multimodal MRI of myalgic encephalomyelitis/chronic fatigue syndrome: A cross-sectional neuroimaging study toward its neuropathophysiology and diagnosis

Abstract:

Introduction: Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS), is a debilitating illness affecting up to 24 million people worldwide but concerningly there is no known mechanism for ME/CFS and no objective test for diagnosis. A series of our neuroimaging findings in ME/CFS, including functional MRI (fMRI) signal characteristics and structural changes in brain regions particularly sensitive to hypoxia, has informed the hypothesis that abnormal neurovascular coupling (NVC) may be the neurobiological origin of ME/CFS. NVC is a critical process for normal brain function, in which glutamate from an active neuron stimulates Ca2+ influx in adjacent neurons and astrocytes. In turn, increased Ca2+ concentrations in both astrocytes and neurons trigger the synthesis of vascular dilator factors to increase local blood flow assuring activated neurons are supplied with their energy needs.

This study investigates NVC using multimodal MRIs: (1) hemodynamic response function (HRF) that represents regional brain blood flow changes in response to neural activities and will be modeled from a cognitive task fMRI; (2) respiration response function (RRF) represents autoregulation of regional blood flow due to carbon dioxide and will be modeled from breath-holding fMRI; (3) neural activity associated glutamate changes will be modeled from a cognitive task functional magnetic resonance spectroscopy. We also aim to develop a neuromarker for ME/CFS diagnosis by integrating the multimodal MRIs with a deep machine learning framework.

Methods and analysis: This cross-sectional study will recruit 288 participants (91 ME/CFS, 61 individuals with chronic fatigue, 91 healthy controls with sedentary lifestyles, 45 fibromyalgia). The ME/CFS will be diagnosed by consensus diagnosis made by two clinicians using the Canadian Consensus Criteria 2003. Symptoms, vital signs, and activity measures will be collected alongside multimodal MRI.

The HRF, RRF, and glutamate changes will be compared among four groups using one-way analysis of covariance (ANCOVA). Equivalent non-parametric methods will be used for measures that do not exhibit a normal distribution. The activity measure, body mass index, sex, age, depression, and anxiety will be included as covariates for all statistical analyses with the false discovery rate used to correct for multiple comparisons.

The data will be randomly divided into a training (N = 188) and a validation (N = 100) group. Each MRI measure will be entered as input for a least absolute shrinkage and selection operator—regularized principal components regression to generate a brain pattern of distributed clusters that predict disease severity. The identified brain pattern will be integrated using multimodal deep Boltzmann machines as a neuromarker for predicting ME/CFS fatigue conditions. The receiver operating characteristic curve of the identified neuromarker will be determined using data from the validation group.

Ethics and study registry: This study was reviewed and approved by University of the Sunshine Coast University Ethics committee (A191288) and has been registered with The Australian New Zealand Clinical Trials Registry (ACTRN12622001095752).

Dissemination of results: The results will be disseminated through peer reviewed scientific manuscripts and conferences and to patients through social media and active engagement with ME/CFS associations.

Source: Shan ZY, Mohamed AZ, Andersen T, Rendall S, Kwiatek RA, Fante PD, Calhoun VD, Bhuta S, Lagopoulos J. Multimodal MRI of myalgic encephalomyelitis/chronic fatigue syndrome: A cross-sectional neuroimaging study toward its neuropathophysiology and diagnosis. Front Neurol. 2022 Sep 16;13:954142. doi: 10.3389/fneur.2022.954142. PMID: 36188362; PMCID: PMC9523103. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9523103/ (Full text)

Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs

Abstract:

Accurate stratification of patients with Post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies and could enable more focused investigation of the molecular pathogenetic mechanisms of this disease. However, the natural history of long COVID is incompletely understood and characterized by an extremely wide range of manifestations that are difficult to analyze computationally. In addition, the generalizability of machine learning classification of COVID-19 clinical outcomes has rarely been tested.

We present a method for computationally modeling long COVID phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Using unsupervised machine learning (k-means clustering), we found six distinct clusters of long COVID patients, each with distinct profiles of phenotypic abnormalities with enrichments in pulmonary, cardiovascular, neuropsychiatric, and constitutional symptoms such as fatigue and fever.

There was a highly significant association of cluster membership with a range of pre-existing conditions and with measures of severity during acute COVID-19. We show that the clusters we identified in one hospital system were generalizable across different hospital systems. Semantic phenotypic clustering can provide a foundation for assigning patients to stratified subgroups for natural history or therapy studies on long COVID.

Source: Reese J, Blau H, Bergquist T, Loomba JJ, Callahan T, Laraway B, Antonescu C, Casiraghi E, Coleman B, Gargano M, Wilkins K, Cappelletti L, Fontana T, Ammar N, Antony B, Murali TM, Karlebach G, McMurry JA, Williams A, Moffitt R, Banerjee J, Solomonides AE, Davis H, Kostka K, Valentini G, Sahner D, Chute CG, Madlock-Brown C, Haendel MA, Robinson PN; and the RECOVER Consortium. Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs. medRxiv [Preprint]. 2022 May 25:2022.05.24.22275398. doi: 10.1101/2022.05.24.22275398. PMID: 35665012; PMCID: PMC9164456. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9164456/ (Full text)

Evidence for Peroxisomal Dysfunction and Dysregulation of the CDP-Choline Pathway in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome

Abstract:

Background: Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a chronic and debilitating disease that is characterized by unexplained physical fatigue unrelieved by rest. Symptoms also include cognitive and sensory dysfunction, sleeping disturbances, orthostatic intolerance, and gastrointestinal problems. A syndrome clinically similar to ME/CFS has been reported following well-documented infections with the coronaviruses SARS-CoV and MERS-CoV. At least 10% of COVID-19 survivors develop post acute sequelae of SARS-CoV-2 infection (PASC). Although many individuals with PASC have evidence of structural organ damage, a subset have symptoms consistent with ME/CFS including fatigue, post exertional malaise, cognitive dysfunction, gastrointestinal disturbances, and postural orthostatic intolerance. These common features in ME/CFS and PASC suggest that insights into the pathogenesis of either may enrich our understanding of both syndromes, and could expedite the development of strategies for identifying those at risk and interventions that prevent or mitigate disease.

Methods: Using regression, Bayesian and enrichment analyses, we conducted targeted and untargeted metabolomic analysis of 888 metabolic analytes in plasma samples of 106 ME/CFS cases and 91 frequency-matched healthy controls.

Results: In ME/CFS cases, regression, Bayesian and enrichment analyses revealed evidence of peroxisomal dysfunction with decreased levels of plasmalogens. Other findings included decreased levels of several membrane lipids, including phosphatidylcholines and sphingomyelins, that may indicate dysregulation of the cytidine-5’-diphosphocholine pathway. Enrichment analyses revealed decreased levels of choline, ceramides and carnitines, and increased levels of long chain triglycerides (TG) and hydroxy-eicosapentaenoic acid. Elevated levels of dicarboxylic acids were consistent with abnormalities in the tricarboxylic acid cycle. Using machine learning algorithms with selected metabolites as predictors, we were able to differentiate female ME/CFS cases from female controls (highest AUC=0.794) and ME/CFS cases without self-reported irritable bowel syndrome (sr-IBS) from controls without sr-IBS (highest AUC=0.873).

Conclusion: Our findings are consistent with earlier ME/CFS work indicating compromised energy metabolism and redox imbalance, and highlight new abnormalities that may provide insights into the pathogenesis of ME/CFS.

One sentence summary: Plasma levels of plasmalogens are decreased in patients with myalgic encephalomyelitis/chronic fatigue syndrome suggesting peroxisome dysfunction.

Source: Che X, Brydges CR, Yu Y, Price A, Joshi S, Roy A, Lee B, Barupal DK, Cheng A, Palmer DM, Levine S, Peterson DL, Vernon SD, Bateman L, Hornig M, Montoya JG, Komaroff AL, Fiehn O, Lipkin WI. Evidence for Peroxisomal Dysfunction and Dysregulation of the CDP-Choline Pathway in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. medRxiv [Preprint]. 2022 Jan 11:2021.06.14.21258895. doi: 10.1101/2021.06.14.21258895. PMID: 35043127; PMCID: PMC8764736. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8764736/ (Full text)

Who has long-COVID? A big data approach

Abstract:

Background Post-acute sequelae of SARS-CoV-2 infection (PASC), otherwise known as long-COVID, have severely impacted recovery from the pandemic for patients and society alike. This new disease is characterized by evolving, heterogeneous symptoms, making it challenging to derive an unambiguous long-COVID definition. Electronic health record (EHR) studies are a critical element of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, which is addressing the urgent need to understand PASC, accurately identify who has PASC, and identify treatments.

Methods Using the National COVID Cohort Collaborative’s (N3C) EHR repository, we developed XGBoost machine learning (ML) models to identify potential long-COVID patients. We examined demographics, healthcare utilization, diagnoses, and medications for 97,995 adult COVID-19 patients. We used these features and 597 long-COVID clinic patients to train three ML models to identify potential long-COVID patients among (1) all COVID-19 patients, (2) patients hospitalized with COVID-19, and (3) patients who had COVID-19 but were not hospitalized.

Findings Our models identified potential long-COVID patients with high accuracy, achieving areas under the receiver operator characteristic curve of 0.91 (all patients), 0.90 (hospitalized); and 0.85 (non-hospitalized). Important features include rate of healthcare utilization, patient age, dyspnea, and other diagnosis and medication information available within the EHR. Applying the “all patients” model to the larger N3C cohort identified 100,263 potential long-COVID patients.

Interpretation Patients flagged by our models can be interpreted as “patients likely to be referred to or seek care at a long-COVID specialty clinic,” an essential proxy for long-COVID diagnosis in the current absence of a definition. We also achieve the urgent goal of identifying potential long-COVID patients for clinical trials. As more data sources are identified, the models can be retrained and tuned based on study needs.

Source: Pfaff ER, Girvin AT, Bennett TD, Bhatia A, Brooks IM, Deer RR, Dekermanjian JP, Jolley SE, Kahn MG, Kostka K, McMurry JA, Moffitt R, Walden A, Chute CG, Haendel MA, The N3C Consortium. (2021). Who has long-COVID? A big data approach [preprint]. UMass Center for Clinical and Translational Science Supported Publications. https://doi.org/10.1101/2021.10.18.21265168. Retrieved from https://escholarship.umassmed.edu/umccts_pubs/253

Plasma proteomic profiling suggests an association between antigen driven clonal B cell expansion and ME/CFS

Abstract:

Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is an unexplained chronic, debilitating illness characterized by fatigue, sleep disturbances, cognitive dysfunction, orthostatic intolerance and gastrointestinal problems.

Using ultra performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS), we analyzed the plasma proteomes of 39 ME/CFS patients and 41 healthy controls. Logistic regression models, with both linear and quadratic terms of the protein levels as independent variables, revealed a significant association between ME/CFS and the immunoglobulin heavy variable (IGHV) region 3-23/30.

Stratifying the ME/CFS group based on self-reported irritable bowel syndrome (sr-IBS) status revealed a significant quadratic effect of immunoglobulin lambda constant region 7 on its association with ME/CFS with sr-IBS whilst IGHV3-23/30 and immunoglobulin kappa variable region 3-11 were significantly associated with ME/CFS without sr-IBS.

In addition, we were able to predict ME/CFS status with a high degree of accuracy (AUC = 0.774-0.838) using a panel of proteins selected by 3 different machine learning algorithms: Lasso, Random Forests, and XGBoost. These algorithms also identified proteomic profiles that predicted the status of ME/CFS patients with sr-IBS (AUC = 0.806-0.846) and ME/CFS without sr-IBS (AUC = 0.754-0.780).

Our findings are consistent with a significant association of ME/CFS with immune dysregulation and highlight the potential use of the plasma proteome as a source of biomarkers for disease.

Source: Milivojevic M, Che X, Bateman L, et al. Plasma proteomic profiling suggests an association between antigen driven clonal B cell expansion and ME/CFS. PLoS One. 2020;15(7):e0236148. Published 2020 Jul 21. doi:10.1371/journal.pone.0236148 https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0236148 (Full text)

A Machine Learning Approach to the Differentiation of Functional Magnetic Resonance Imaging Data of Chronic Fatigue Syndrome (CFS) From a Sedentary Control

Abstract:

Chronic Fatigue Syndrome (CFS) is a debilitating condition estimated to impact at least 1 million individuals in the United States, however there persists controversy about its existence. Machine learning algorithms have become a powerful methodology for evaluating multi-regional areas of fMRI activation that can classify disease phenotype from sedentary control. Uncovering objective biomarkers such as an fMRI pattern is important for lending credibility to diagnosis of CFS.

fMRI scans were evaluated for 69 patients (38 CFS and 31 Control) taken before (Day 1) and after (Day 2) a submaximal exercise test while undergoing the n-back memory paradigm. A predictive model was created by grouping fMRI voxels into the Automated Anatomical Labeling (AAL) atlas, splitting the data into a training and testing dataset, and feeding these inputs into a logistic regression to evaluate differences between CFS and control. Model results were cross-validated 10 times to ensure accuracy. Model results were able to differentiate CFS from sedentary controls at a 80% accuracy on Day 1 and 76% accuracy on Day 2 (Table 3).

Recursive features selection identified 29 ROI’s that significantly distinguished CFS from control on Day 1 and 28 ROI’s on Day 2 with 10 regions of overlap shared with Day 1 (Figure 3). These 10 shared regions included the putamen, inferior frontal gyrus, orbital (F3O), supramarginal gyrus (SMG), temporal pole; superior temporal gyrus (T1P) and caudate ROIs. This study was able to uncover a pattern of activated neurological regions that differentiated CFS from Control.

This pattern provides a first step toward developing fMRI as a diagnostic biomarker and suggests this methodology could be emulated for other disorders. We concluded that a logistic regression model performed on fMRI data significantly differentiated CFS from Control.

Source: Provenzano D, Washington SD, Baraniuk JN. A Machine Learning Approach to the Differentiation of Functional Magnetic Resonance Imaging Data of Chronic Fatigue Syndrome (CFS) From a Sedentary Control. Front Comput Neurosci. 2020 Jan 29;14:2. doi: 10.3389/fncom.2020.00002. eCollection 2020. https://www.ncbi.nlm.nih.gov/pubmed/32063839