Long read sequencing characterises a novel structural variant, revealing underactive AKR1C1 with overactive AKR1C2 as a possible cause of unexplained severe fatigue

Abstract

Background: Causative genetic variants cannot yet be found for many disorders with a clear heritable component, including chronic fatigue disorders like myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). These conditions may involve genes in difficult-to-align genomic regions that are refractory to short read approaches. Structural variants in these regions can be particularly hard to detect or define with short reads, yet may account for a significant number of cases. Long read sequencing can overcome these difficulties but so far little data is available regarding the specific analytical challenges inherent in such regions, which need to be taken into account to ensure that variants are correctly identified.

Research into chronic fatigue disorders faces the additional challenge that the heterogeneous patient population likely encompasses multiple aetiologies with overlapping symptoms, rather than a single disease entity, such that each individual abnormality may lack statistical significance within a larger sample. Better delineation of patient subgroups is needed to target research and treatment.

Methods: We use nanopore sequencing in a case of unexplained severe fatigue to identify and fully characterise a large inversion in a highly homologous region spanning the AKR1C gene locus, which was indicated but could not be resolved by short-read sequencing. We then use GC-MS/MS serum steroid analysis to investigate the functional consequences.

Results: Several commonly used bioinformatics tools are confounded by the homology but a combined approach including visual inspection allows the variant to be accurately resolved. The DNA inversion appears to increase the expression of AKR1C2 while limiting AKR1C1 activity, resulting in a relative increase of inhibitory neurosteroids and impaired progesterone metabolism.

Conclusions: This study provides an example of how long read sequencing can improve diagnostic yield in research and clinical care, and highlights some of the analytical challenges presented by regions containing tandem arrays of genes. It also proposes a novel gene associated with a specific disease aetiology that may be an underlying cause of complex chronic fatigue and possibly other conditions too. It reveals biomarkers that could be assessed in a larger cohort, potentially identifying a subset of patients who might respond to treatments suggested by the aetiology.

Source: Julia Oakley, Martin Hill, Adam Giess, Mélanie Tanguy, Greg Elgar. Long read sequencing characterises a novel structural variant, revealing underactive AKR1C1 with overactive AKR1C2 as a possible cause of unexplained severe fatigue. ResearchSquare [Preprint] https://www.researchsquare.com/article/rs-3218228/v2 (Full text)

Bayesian biomarker identification based on marker-expression proteomics data

Abstract:

We are studying variable selection in multiple regression models in which molecular markers and/or gene-expression measurements as well as intensity measurements from protein spectra serve as predictors for the outcome variable (i.e., trait or disease state).

Finding genetic biomarkers and searching genetic-epidemiological factors can be formulated as a statistical problem of variable selection, in which, from a large set of candidates, a small number of trait-associated predictors are identified. We illustrate our approach by analyzing the data available for chronic fatigue syndrome (CFS).

CFS is a complex disease from several aspects, e.g., it is difficult to diagnose and difficult to quantify. To identify biomarkers we used microarray data and SELDI-TOF-based proteomics data. We also analyzed genetic marker information for a large number of SNPs for an overlapping set of individuals. The objectives of the analyses were to identify markers specific to fatigue that are also possibly exclusive to CFS. The use of such models can be motivated, for example, by the search for new biomarkers for the diagnosis and prognosis of cancer and measures of response to therapy. Generally, for this we use Bayesian hierarchical modeling and Markov Chain Monte Carlo computation.

 

Source: Bhattacharjee M, Botting CH, Sillanpää MJ. Bayesian biomarker identification based on marker-expression proteomics data. Genomics. 2008 Dec;92(6):384-92. doi: 10.1016/j.ygeno.2008.06.006. Epub 2008 Aug 15. http://www.sciencedirect.com/science/article/pii/S0888754308001420 (Full article)