In partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Bioinformatics
in the School of Biological Sciences
Jianshu Zhao
Defends his thesis:
Bioinformatic Software Development for Large-Scale Microbial Genome Analysis with Applications in Disturbance Ecology and SAR11 Species Evolution
Tuesday, July 9, 2024
3:00pm Eastern
Room: 3229 ES&T building
Zoom link: https://gatech.zoom.us/j/92904499229
Thesis Advisor:
Dr. Konstantinos T. Konstantinidis, School of Civil and Environmental Engineering and School of Biological Sciences (by courtesy), Georgia Institute of Technology, USA
Committee Members:
Dr. Luis M. Rodriguez-R, Department of Microbiology & Digital Science Center (DiSC), University of Innsbruck, Austria
Dr. I. King Jordan, School of Biological Sciences, Georgia Institute of Technology, USA
Dr. Joel E. Kostka, School of Biological Sciences, Georgia Institute of Technology, USA
Dr. Frank J. Stewart, Department of Microbiology & Cell Biology, Montana State University, USA
Abstract:
Microbial genome analysis has become increasingly challenging due to the rapidly growing volume of genomic data (e.g., metagenomes, single-cell genomes). In this thesis, several bioinformatic tools were developed to efficiently perform large-scale microbial genome comparison, genome search, genome classification and dimension reduction (or visualization). To achieve computational efficiency, these tools employed cutting-edge probabilistic data structures and small-word graph-based algorithms, in addition to concurrent and parallel programming. Subsequently, the tools were applied to genomic and metagenomic datasets to test prevailing theories about how microbial species evolve and respond to environmental disturbance. Analysis of time-series data from a costal microbial community disturbance ecology experiment, together with the development of a quantitative method to define rare biosphere showed that natural bacterial populations are remarkably resilient, and that the rare biosphere contributes to this resilience. Additionally, the analysis of single-cell genomes from the oxygen minimum zones of the oceans showed that SAR11, the most abundant heterotrophic marine group, is an outlier to the widely recognized species boundaries at the 95% ANI threshold due to highly promiscuous and unbiased across the genome homologous recombination (cohesive force for species). Consistent with the latter interpretation, recombination has caused gene (as opposed to genome) sweeps for metabolic functions under strong in-situ (positive) selection, such as the respiratory nitrate reductase (NarG), within the SAR11 genomospecies. In summary, the bioinformatic tools and the eco-evolutionary insights obtained in this thesis advance the means to study and our understanding of natural microbial communities, respectively.