David Roos, Ph.D.

E. Otis Kendall Professor of Biology
University of Pennsylvania



Cell and Developmental Biology

Genetics, Epigenetics, Genomics

Computational Biology

Title of Presentation: 
Designing and Mining (Pathogen) Omics Database Resources
Abstract : 

Biomedical research is increasingly driven by large-scale datasets: genome sequences, RNA and protein expression results (generated on diverse experimental platforms), population-level data on genetic polymorphisms and epidemiology, information on protein structure, interactions and subcellular localization, metabolic pathways and signaling networks, phenotypic descriptions of laboratory mutants/treatments and field/clinical samples, etc.  Infectious disease studies are further complicated by the interplay between pathogen, host, and vector species.  The challenge is: how can we effectively collect, store, maintain, integrate, and mine this information, so as to advance biological understanding, and define targets for further investigation in the lab, field and clinic?

     The Eukaryotic Pathogen Genome Database (EuPathDB.org) provides researchers working on diverse eukaryotic microbes (now including fungi, in addition to parasitic protozoa) with convenient access to genomic-scale datasets, in a phylogenetic framework expediting discovery research.  In addition to offering gene- and genome-centric views, workspaces for the analysis of user-supplied data, and a mechanism for capturing expert annotation from the scientific community, graphical user interfaces simplify the formulation and optimization of complex queries.  For example, investigators seeking to identify factors likely to modulate host responses to pathogen infection might wish to search for genes that are conserved in pathogenic but not non-pathogenic species, expressed in relevant strains (and during appropriate life cycle stages), secreted by the infectious agent, harboring domains suggestive of interaction with host factors, and displaying signatures of evolutionary selection.  Similar strategies might be employed to identify diagnostic markers of infection, therapeutic targets, etc.  Such queries can be shared with colleagues or stored for future use, refinement, or modification, enabling systems-level analysis of biologically and clinically relevant problems.  Beyond direct application to pathogen Omics datasets, this platform has also proved useful for integrating and interrogating clinical (meta)data from longitudinal field studies.