Abstract:
We are studying variable selection in multiple regression models in which molecular markers and/or gene-expression measurements as well as intensity measurements from protein spectra serve as predictors for the outcome variable (i.e., trait or disease state).
Finding genetic biomarkers and searching genetic-epidemiological factors can be formulated as a statistical problem of variable selection, in which, from a large set of candidates, a small number of trait-associated predictors are identified. We illustrate our approach by analyzing the data available for chronic fatigue syndrome (CFS).
CFS is a complex disease from several aspects, e.g., it is difficult to diagnose and difficult to quantify. To identify biomarkers we used microarray data and SELDI-TOF-based proteomics data. We also analyzed genetic marker information for a large number of SNPs for an overlapping set of individuals. The objectives of the analyses were to identify markers specific to fatigue that are also possibly exclusive to CFS. The use of such models can be motivated, for example, by the search for new biomarkers for the diagnosis and prognosis of cancer and measures of response to therapy. Generally, for this we use Bayesian hierarchical modeling and Markov Chain Monte Carlo computation.
Source: Bhattacharjee M, Botting CH, Sillanpää MJ. Bayesian biomarker identification based on marker-expression proteomics data. Genomics. 2008 Dec;92(6):384-92. doi: 10.1016/j.ygeno.2008.06.006. Epub 2008 Aug 15. http://www.sciencedirect.com/science/article/pii/S0888754308001420 (Full article)