Antibiotics Discovery: From Genome Sequencing to Genome Mining to Spectral Networks
Pavel Pevzner, Ph.D. (Margaret O. Dayhoff Lecturer) - University of California, San Diego
Genomics studies revealed numerous antibiotics-encoding genes across a wide range of bacterial and fungal species, including various species in the human microbiome. However, little is known about the hundreds of secondary metabolites (including antibiotics!) produced by microorganisms in the gut, despite the fact that humans are chronically exposed to them. Deep exploration of this meta-antibiome critically depends on a transition from the current one-off process of antibiotics analysis to a high-throughput antibiotics sequencing. I will discuss recent advances in computational antibiotics discovery that span bioinformatics techniques ranging from genome sequencing to genome mining to spectral networks.
The Microevolution Processes in Human Populations: The Emerging Portrait of Global Gene Pool Structure
Oleg Balanovsky, Ph.D. - Vavilov Institute of General Genetics
The studies of genetic variation in human populations started almost 100 years ago: at the time of World War I, the pronounced differences in frequencies of blood groups were revealed for the first time. During the following century-long history of intensive research, the arsenal of population geneticists has changed six times.
The immunological markers or blood groups (1) were only available genetic systems for decades until biochemical markers (2) were widely introduced in 1960s. Both types are known as “classical markers”. The datasets on their variation in human populations worldwide are large and have been summarized by both Western (Cavalli-Sforza et al., 1994) and Russian (Gene pool and gene geography of fUSSR) scientific schools in the fields of gene geography. The classical markers are virtually out of experimental use in present days. But because their variation has been well described and analyzed, these generalized conclusions are widely used as a background for current research.
Since 1990s, the mitochondrial DNA (3) and Y-chromosome (4) became the most popular genetic systems in population studies. Hundreds of papers were dedicated to their variation, and accumulated datasets include hundreds thousands of samples from thousands of populations worldwide. The genome-wide (5) and full genome (6) markers are becoming the new favorite tools in the arsenal of researchers, but data on these genetic systems are not abundant yet. Thus, the first task is to summarize the accumulated data on mitochondrial DNA and Y-chromosomal variation, to extract the generalized patterns and to make the overall conclusions of the global gene pool structure from these two kinds of genetic data. The general trends in human variation revealed by these two systems will be valuable for decades even when no living researcher will remember experimental methods for their analysis. The second task is to compare these trends with the picture drawn by genome-wide markers in the last years and with the - emerging this year – picture drawn by the full genome sequencing.
The talk will present:
- short reminder of the global trends in human variation, revealed by the classical genetic markers;
- the largest databases on mitochondrial DNA and Y-chromosomal variation worldwide;
- the cartographic atlases summing up patterns of human variation revealed by these two systems, including major directions of human colonization of the Earth, (sub)continental genetic continuums and boundaries between them, changes in the effective population size and fantastic geographic precision of human identification by using the Y-chromosomal lineages;
- the genetic structuring of the world populations as revealed by genome-wide markers and more detailed picture of genetic history of Europeans;
- the history of humans decoded from their full genomes and currently submitted to Nature;
- the promising approach of parallel analysis of the host and the pathogen genetic variation to trace migrations of both species.
Deep Sequencing for Human and Animal Viral Discovery and Diagnostics
Eric Delwart, Ph.D. - University of California, San Francisco
The identification of known and previously uncharacterized viruses using in silico sequence similarities searches will be described together with common pitfalls. The viromes of wild and domesticated animals and the possible association of “new” viruses in human and animal samples with different clinical symptoms will be shown. Challenges in the bioinformatic analyses of metagenomics data will be discussed together with opportunities in research and diagnostics.
High throughput genomic analysis to rewind the clock on the evolution of drug resistant tuberculosis
Ashlee Earl, Ph.D. - The Broad Institute of MIT & Harvard
Drug resistant tuberculosis (DR-TB) is an urgent and growing threat as multi-, extensively- and even totally-drug resistant (MDR, XDR and TDR) cases of TB are increasingly reported. Incomplete knowledge of the mutations that give rise to drug resistance in the causative agent of TB, Mycobacterium tuberculosis, has hampered development of point-of-care molecular diagnostics that would enable effective TB patient management and decrease DR-TB emergence. With our partners, we have sequenced and analyzed geographically and phenotypically diverse collections of M. tuberculosis to analyze the evolution of DR-TB and to create a more comprehensive catalog of DR-associated mutations. I will discuss findings from this work, which include insights into the step-wise evolution of XDR-TB within an epidemic region and its relevance for global TB control.
Reconstructing Outbreaks with Genomics: Pathogen Evolution Giveth and it Taketh Away
Jennifer Gardy, Ph.D. - British Columbia Centre for Disease Control
By sequencing the complete genomes of pathogen isolates from a given outbreak, molecular epidemiologists are now able to identify the small handful of mutations that distinguish isolates from other and that can be used as markers of transmission. This “genomic epidemiology” approach has been used to identify person-to-person transmission of an infectious disease, trace hospital outbreaks back to unique environmental sources, and explore the dynamics of regional epidemics including Ebolavirus. However, the evolutionary processes that allow us to describe outbreak dynamics at such high resolution also introduce complications when interpreting genomic data through an epidemiological lens, particularly for pathogens with long periods of latency or chronic carriage. This talk will cover opportunities and challenges for genomic epidemiology, using a long outbreak of tuberculosis amongst the homeless population as a guiding example.
Getting the Flu: Exploring Influenza Virus Evolutionary Dynamics
Elodie Ghedin, Ph.D. - New York University
The characterization of virus populations by deep sequencing is transforming our understanding of viral evolutionary dynamics by enabling the dissection of the mutational spectrum at an extraordinary level of precision. Using the same tools we can also query the host environment in which the virus evolves—such as host microbial ecology and local response to infection—to determine its effect on virus evolution. I will illustrate how immune status, the respiratory microbiome, mixed infections and transmission can shape influenza virus diversity. I will also discuss the epidemiological value of high-resolution mapping and haplotype reconstruction in modeling influenza transmission networks.
Transcriptional Analysis of Malaria Challenge and Vaccine Response Studies in Colombia
Greg Gibson, Ph.D. - Georgia Tech
Plasmodium vivax is the most prevalent malaria parasite in Latin America. It causes endemic exposure to disease in large parts of Colombia, where we have initiated studies of the efficacy of an irradiated sporozoite vaccine. In order to investigate the molecular nature of the immunological response to exposure and vaccination, we have carried out two gene expression profiling studies, one contrasting the malaria response in naïve and semi-immune volunteers, and the other evaluating whether vaccination produces a recognizable immune profile.
In the first study, a total of 16 Colombian malaria naïve (n=7 from Cali) and semi-immune (n=9 from Buenaventura) volunteers were subjected to an experimental P.vivax sporozoite infectious challenge using direct infected Anopheles mosquito bites. We used a Fluidigm nanofluidic qRT-PCR array to profile the expression of 92 genes in whole blood of the 16 individuals across 6 time-points following infection, and followed up with RNASeq analysis of 6 individuals from each location at baseline and first signs of malaria. The results show that there is very little modification of gene expression during pre-patent infection, but strong up-regulation of an interferon-response axis at the peak of parasitemia, and a surprising down-regulation of the inflammatory response at the same time. Approximately 200 genes were differentially expressed between the locations, mostly indicating an accentuated response in the naïve volunteers that correlates with worse malaria symptoms.
In the second study, we performed RNASeq on 22 whole blood samples, taken after immunization but before sporozoite challenge, and after sporozoite challenge at first signs of malaria symptoms in some individuals. We contrasted the responses of 3 control individuals who were not immunized, 3 Duffy-negative individuals who are protected against malaria, and 5 vaccinated individuals, one of whom was fully protected and four of whom showed mild symptoms. Approximately 1000 genes were differentially expressed between the two time-points, resulting in profiles that correspond to some extent with symptomology. Further analyses are expected to shed light on the molecular mechanisms of immune effectiveness.
Using Meta-Transcriptomics and Ancient DNA to Reveal Microbial Emergence and Evolution
Edward Holmes, Ph.D. - University of Sydney
I will show how modern genomic techniques, notably meta-transcriptomics and ancient DNA, can provide important new information on microbial biodiversity, origins and evolution. I will first demonstrate how the meta-transcriptomic analysis of invertebrate species is transforming our understanding of viral evolution, revealing the ancestry of many vertebrate viruses, challenging traditional classification systems, and highlighting that most RNA viruses are unlikely to be associated with disease in their hosts. I will then show how the analysis of ‘ancient DNA’ from archival human remains (including those present in medical collections) can inform on past infectious disease epidemics, focusing on two infamous bacterial diseases – plague and cholera – with recovery of genomes >1000 years old now possible.
Genomic Investigations of Anthrax – Sequence Data from Complex Specimens
Paul Keim, Ph.D. - Northern Arizona University
We have a taken a genomic reference based approach to understanding particular pathogens of great interest. This involves the development of robust population genetic models developed from global strain collections with curated SNP databases. In later investigations, high quality genomic material is not always available and the resulting genomic interrogations of limited coverage and quality. We will report on a traditional molecular epidemiological investigation of heroin contaminated with B. anthracis, as well as, the analysis of anthrax victim’s pathology specimen remnant from the Soviet biological weapons era.
GHOSTing Molecular Surveillance of Viral Hepatitis
Yury Khudyakov, PhD - Centers for Disease Control and Prevention
Viral hepatitis, a major health problem worldwide, is caused by infection with 5 viruses, all belonging to different viral families. Effective molecular surveillance focused on measuring disease and its dissemination among human populations is essential for the development of successful public health interventions to interrupt transmission and reduce hepatitis-related morbidity and mortality. To be effective and relevant to public health, molecular surveillance must be massive and in real-time. Next-generation sequencing (NGS) technology generates large quantities of viral genetic data suitable for application in surveillance. However, NGS application alone is not sufficient for molecular surveillance to be most effective. Considering that an estimated 170 million people are currently infected with hepatitis C virus, there is not a single laboratory that has the capacity to collect serum specimens, and sequence and analyze viral variants from just 1% of infected persons. However, by employing a specially organized crowd-sourcing system for massive data gathering and analysis, effective molecular surveillance can be achieved. I'll describe Global Health, Outbreak and Surveillance Technology (GHOST), which is a technological environment that integrates - in a particular way - molecular, computational and information technologies for molecular surveillance of infectious diseases and is currently being applied to viral hepatitis for detection of transmission networks.
Toward microbial disease diagnosis using metagenomics: A case of the runs
Kostas Konstantinidis, Ph.D. - Georgia Tech
Culture-independent analysis (aka metagenomics) has recently revealed tremendous diversity in the microbial communities inhabiting the human body and revolutionized our understanding of these communities primarily because the majority of microbial species cannot be cultured in the laboratory and thus, remain poorly understood. The ability to characterize in detail the microbial constituents in human clinical specimens without culture may prove invaluable in public health as all surveillance and outbreak detection methods to date rely on culture, frequently failing to identify the key causative agent of a disease. In this talk, I will summarize our recently developed bioinformatics algorithms and approaches to deal with several challenges associated with metagenome-based analysis of clinical samples such as how to detect and quantify target species (e.g., pathogens) and genes (e.g., toxins) in complex metagenomes, and how to identify and genotype organisms with no previously sequenced representatives. Application of these tools to samples from diarrheal outbreaks shows that in many cases -but not all- the disease and healthy states of the gut microbial community can be distinguished from each other, opening new possibilities for diagnostics. The tools are available for online analysis through http://enve-omics.gatech.edu/
Discovery of Novel Mobile Elements and Host Defense systems in Genomic and Metagenomic Databases
Eugene Koonin, Ph.D. - National Center for Biotechnology Information (NCBI)/National Institutes of Health (NIH)
The rapidly growing genomic and metagenomic sequence database provide researchers with a rapidly expanding and diversifying resource for discovery of novel genetic elements. Unlike the more traditional databases of complete genomes, this new resource is not restricted by the ability of microbes to grow in culture and therefore represents unbiased sampling of biological diversity. We developed computational pipelines for identification of novel mobile elements as well as cellular defence systems in genomic and metagenomics systems. Application of these approaches resulted in the discovery of several families of new viruses and transposable elements as well as new types of CRISPR-Cas adaptive immunity systems. Comparative analysis of the discovered novel defence systems prompted the scenario of independent evolution of different types of CRISPR-Cas systems from distinct mobile elements. Experimental validation of the predictions produced by our searches led to the identification of novel mechanisms of CRISPR-mediated defence that could be exploited in new tools for genome engineering.
The Extraordinary Evolution of the Great Ape Microbiome
Howard Ochman, Ph.D. - University of Texas
Despite the current large body of work concerning the human microbiome and its role in human health, there is relatively little information about how the microbiome evolves or the factors causing differentiation among species. Analysis of the gut microbiomes of great ape species, including humans, revealed that the phylogeny based on microbiome compositions was congruent with the known relationships of the hosts. Our investigations of the microbiomes of great apes have informed several other features of the human microbiome. For example, the gut microbial communities of humans assort into enterotypes, i.e., groups having discrete species compositions, and there is on-going debate about the cause, function, and even the existence of enterotypes. We found that wild chimpanzees and gorillas also possess gut enterotypes, and, interestingly, they are compositionally similar to those in humans. Thus, stratification of microbial communities into enterotypes preceded the divergence of great ape species and did not originate in humans as a result of modern diets, as has been speculated. Furthermore, by comparing the gut microbiomes of great ape species in a phylogenetic context, we reconstructed how the human microbiome evolved during great ape diversification. We found that human gut microbiomes have been diverging at a greatly accelerated rate since our split from other great apes due to the loss of microbial diversity at every taxonomic level.
Preparing for the $1 Genome: Fast Genome and Metagenome Distance Estimation using MinHash
Adam Phillippy, Ph.D. - University of Maryland
The rapid growth of genomic data has begun to outpace traditional methods for sequence clustering and search. As the cost of sequencing approaches zero, new methods are needed to processes the coming torrent of data. To address this, I will discuss applications of MinHash locality-sensitive hashing for dimensionality reduction and rapid approximation of genomic and metagenomic distances. This technique can be applied to any problem that requires a fast distance approximation, e.g. to triage and cluster sequence data, assign species labels to unknown genomes, quickly identify mis-tracked samples, and search massive genomic databases. I will discuss specifically the applications of rapid pathogen triage and metagenomic clustering.
Novel Tools for the Annotation of Bacterial Secreted Proteins and Secretion Systems
Thomas Rattei, Ph.D. - University of Vienna
Interactions between bacteria and eukaryotes are widespread in all ecosystems on earth and often lead to symbiotic relationships. The most prominent themes in current research are different types of human-microbe interactions, such as the interplay of human microbiomes with their host or human infections by bacterial pathogens. Understanding of bacterial interactions with other hosts, such as livestock animals and crop plants, are becoming crucial for sustaining nutrition and gaining renewable energy. Protein secretion systems play a key role in the interaction of bacteria and hosts. So far, sequence similarity searches and models of signal peptides were the main tools for the computational prediction of secreted proteins and secretion systems.
In my talk I will introduce recent improvements towards better annotation of bacterial secreted proteins and Type III, IV, VI secretion systems. We have bundled various tools to recognize Type III secretion signals, conserved binding sites of Type III chaperones, eukaryotic-like domains and subcellular targeting signals in the host. We could demonstrate that the combination of these approaches allows a more precise modeling of bacterial secretomes. The rapidly increasing number of available microbial genomes allowed us to develop a novel tool that predicts not only core genes of bacterial secretion systems in bacterial genomes but also their completeness and potential functionality. The approach is based on machine-learning techniques and can be easily applied to thousands of genomes. This method not only predicts secretion systems in newly sequenced genomes and metagenomes, but also suggests that previously unknown proteins are important for the function of protein secretion systems.
Intersection of Bacterial Comparative Genomics and Metagenomics
Timothy Read, Ph.D. - Emory University School of Medicine
Comparative bacterial genomics and shotgun metagenomics are fields of study that have emerged in prominence over the past 15 years. Comparative bacterial genomics attempts to infer population genetic parameters and evolutionary history from collections of bacterial genomes from strains (usually) isolated in pure culture. Shotgun metagenomics, exemplified by the Human Microbiome Project, attempts to characterize the DNA repertoire of whole environments. Surprisingly, there have been few studies attempting to explore the overlap between the two datasets for a single species. Here I will outline the results of investigations of the comparative genomics of pathogens such as Bacillus anthracis, Chlamydia trachomatis and Staphylococcus aureus. I will discuss how results of these studies are being used to query public metagenome datasets and also highlight some of the deficiencies in public data sets. I will also consider how comparative genomic data from host species may be integrated in the future.
Trade-offs Shaping Temperate Phages as Deadly, Accommodated, and Serviceable Parasites
Eduardo Rocha, Ph.D. - Institut Pasteur
Bacterial viruses (phages) are ubiquitous and have a strong impact on microbial population dynamics. Additionally, temperate phages by integrating the genome contribute to the evolution of bacterial gene repertoires. We have been studying the determinants of the lytic lysogeny decision and what this informs about the opportunity costs of lysogeny and about bacteria evolvability. Phages integrate genomes in genetic hotspots that are distributed in ways that protect genome organisation. Importantly, our analysis of the evolutionary patterns of prophages in enterobacteria shows that a sizeable fraction of prophages undergoes purifying selection, a clear pattern of prophage domestication by the bacterial host. We were able to infer the putative function of many of these prophages, and the results point to a significant role of prophage-derived sequences in the establishment of ecological antagonistic interactions with both eukaryotes and prokaryotes.
Designing & Mining (Pathogen) Omics Database Resources
David S. Roos, Ph.D. - University of Pennsylvania
Biomedical research is increasingly driven by large-scale datasets: genome sequences, RNA and protein expression results (generated on diverse experimental platforms), population-level data on genetic polymorphisms and epidemiology, information on protein structure, interactions and subcellular localization, metabolic pathways and signaling networks, phenotypic descriptions of laboratory mutants/treatments and field/clinical samples, etc. Infectious disease studies are further complicated by the interplay between pathogen, host, and vector species. The challenge is: how can we effectively collect, store, maintain, integrate, and mine this information, so as to advance biological understanding, and define targets for further investigation in the lab, field and clinic?
The Eukaryotic Pathogen Genome Database (EuPathDB.org) provides researchers working on diverse eukaryotic microbes (now including fungi, in addition to parasitic protozoa) with convenient access to genomic-scale datasets, in a phylogenetic framework expediting discovery research. In addition to offering gene- and genome-centric views, workspaces for the analysis of user-supplied data, and a mechanism for capturing expert annotation from the scientific community, graphical user interfaces simplify the formulation and optimization of complex queries. For example, investigators seeking to identify factors likely to modulate host responses to pathogen infection might wish to search for genes that are conserved in pathogenic but not non-pathogenic species, expressed in relevant strains (and during appropriate life cycle stages), secreted by the infectious agent, harboring domains suggestive of interaction with host factors, and displaying signatures of evolutionary selection. Similar strategies might be employed to identify diagnostic markers of infection, therapeutic targets, etc. Such queries can be shared with colleagues or stored for future use, refinement, or modification, enabling systems-level analysis of biologically and clinically relevant problems. Beyond direct application to pathogen Omics datasets, this platform has also proved useful for integrating and interrogating clinical (meta)data from longitudinal field studies.
Comparative genomics of ape malaria parasites and the emergence of human malaria
Paul Sharp, Ph.D. - University of Edinburgh
Amplification of DNA sequences from non-invasive (faecal) samples has revealed that chimpanzees (Pan troglodytes) and gorillas (Gorilla gorilla) are each infected with three host-specific Plasmodium species from the subgenus Laverania, which also includes P. falciparum, the cause of malignant malaria in humans. Phylogenetic analyses show that P. falciparum arose from cross-species transmission of one of the gorilla parasites, but that this jump has occurred successfully on only one occasion.
We are now studying the genome sequences of these ape parasite species in order to understand the evolution of the Laverania as a whole, and to shed light on the cross-species transmission from gorillas to humans. We have used selective whole-genome amplification to obtain near-complete genome sequences from one strain of P. reichenowi (a close relative of P. falciparum) and two strains of P. gaboni (a more divergent member of the Laverania), directly from samples from naturally infected wild chimpanzees. We have compared these sequences to a recently published genome from the only laboratory isolate of P. reichenowi, and to data from multiple strains of P. falciparum. The two chimpanzee parasite species show levels of within-species diversity about 10 times higher than seen among P. falciparum strains from across Africa and Asia, consistent with the human parasite having undergone a very recent population bottleneck during transmission from gorillas. Comparative analyses also reveal that a segment of chromosome 4 has undergone horizontal gene transfer (HGT) from a close relative of P. gaboni into the gorilla parasite that was the ancestor of P. falciparum. The transferred genes include that encoding RH5, which appears essential for erythrocyte invasion by P. falciparum, and we speculate that sequence change resulting from HGT may have been important in the process by which the precursor of P. falciparum acquired the ability to infect humans.
Microbial evolution in human guts driven by oxidative stress and contribution to colon cancer initiation
Ying Xu, Ph.D. - University of Georgia
Composition changes in microbial communities in human guts have been known to be associated with colorectal cancer development. However the causal relationship between such changes and cancer development remain elusive. We have recently conducted extensive transcriptomic data analyses of cancer-prone chronic inflammations, namely Crohn’s disease and ulcerative colitis coupled with the matching microbial communities as well as of early colon cancer tissues and associated microbial communities, and made a number of exciting discoveries. First we observed that certain microbial species have their population sizes increase with the level of oxidative stress in the inflammatory microenvironment and they seem to be directly involved in electron transfer from the diseased human tissues as electron receptors, suggesting that these microbes benefit from the influx of electrons. Interestingly some of these microbes secrete metabolites that may contribute to the altered immune responses, which are known to be cancer related. On the other hand, the most significantly increased subpopulations of microbes in human guts observed and reported previously, are known polysaccharide degraders. Knowing that cancer cells tend to have substantially increased polysaccharides on their cell surfaces, this observation strongly suggests that these increased microbial populations are the result of cancer development rather than its cause. Detailed analysis results will be presented and discussed in the context of co-evolution of diseased human cells and microbial community.
Algorithms for Analysis and Applications of High-throughput Sequencing of Intra-host Viral Populations
Alexander Zelikovsky, Ph.D. - Georgia State University
As a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous "swarm." The ability of next-generation sequencing to produce massive quantities of genomic data inexpensively has allowed virologists to study the structure of viral populations from an infected host at an unprecedented resolution. However, high similarity and low frequency of the viral variants as well high sequencing error rate impose a huge challenge to sequencing data analysis. We present a novel method based on linkage between single nucleotide variations to efficiently distinguish them from read errors. This method is able to tolerate the high error-rate of the single-molecule protocol and reconstruct very mutant variants. It is anticipated to facilitate not only viral quasispecies reconstruction, but also other biological questions that require detection of rare haplotypes such as genetic diversity in cancer cell population, and monitoring B-cell and T-cell receptor repertoire. We then show how accurate reconstruction of intra-host viral populations can be applied for identification of transmission clusters and sources, as well as inferring transmission directions of highly heterogeneous viruses such as HIV and HCV. The proposed novel algorithms are based on cluster analysis, random walks in networks and model simulations. The validation on real and simulated data show advantages of the proposed algorithms over consensus based methods.