PulmonDB: A Gene Expression Lung Diseases

Abstract

RATIONALE There is a massive amount of transcriptomic data coming from microarrays and RNA-seq experiments, accumulated since the development of these technologies. Analyzing this data and integrate it to study a complex disease can be an overwhelming task, principally because it would require combining data from different technologies and platforms. Moreover, the lack of uniformity on experimental annotations in public databases such as Gene Expression Omnibus (GEO) adds to the challenge. By integrating transcriptomic datasets from different sources and their curated annotations, we developed an online web resource to facilitate the exploration of gene expression profiles of two respiratory diseases Idiopathic Pulmonary Fibrosis (IPF) and Chronic Obstructive Pulmonary Disease (COPD). This resource can be used to identify differentially expressed genes that replicate in different experiments. This project sets the foundation to integrate transcriptomics data of other respiratory diseases and smoker phenotypes facilitating the identification of common and divergent pathways that lead to a pathological state. METHODS To build the compendium we used COMMAND (Engelen K. et al, 2011), a software that allows the comparison and integration of transcriptomics data from different sources and platforms into a compendium. COMMAND has successfully been used to build transcriptome data compendia in bacteria (Engelen K. et al., 2011) and grapevine (Moretto M. et al., 2016). Using COMMAND we created PulmonDB, a human lung database that allows us to integrate, analyze and explore gene expression data from different sources by contrasting controls and patients using available clinical phenotypes (i.e. age, gender, the status of the disease, FEV1, etc.). We selected transcriptome experiments for IPF and COPD by querying in GEO and ArrayExpress with chosen keywords. Each experiment was downloaded, imported to COMMAND and the experimental conditions were annotated, the contrast group was selected, and data was normalized homogeneously to create PulmonDB. RESULTS PulmonDB is an exploratory web interface that contains IPF and COPD gene expression data; the platform will be expanded to include other abnormal lung phenotypes. This resource facilitates the exploration of gene expression profiles under different pathological conditions and allows the identification of co-expression patterns. CONCLUSIONS PulmonDB can help the scientific community to study which genes have a distinct expression profile related with a disease, explore the reproducibility across technologies and platforms, identify interesting co-expression patterns across diseases and to find relationships among distinct clinical or experimental variables. This abstract is funded by Fronteras 15 project, CONACYT

Publication
In A71. THE EPIGENOME, GENOME AND NON-CODING RNAs IN LUNG DISEASE
Click the Cite button above to copy citation in bibtex format.
Ana Beatriz Villaseñor Altamiano
Ana Beatriz Villaseñor Altamiano
Postdoctoral research fellow at Brigham and Women’s Hospital in the Pulmonary and Critical Care Medicine Division.

My research interests include computer biology, bioinformatics, human diseases, transcriptomics, and single cell technologies.

Related