In partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics in the School of Biological Sciences Aroon T. Chande Defends his thesis:Bioinformatic platforms and methods for worldwide polygenic risk scores Thursday, July 30th, 2020 11:00 AM Eastern Time BlueJeans: https://bluejeans.com/683430155 Thesis Advisor: Dr. I. King Jordan School of Biological Sciences Georgia Institute of Technology Committee Members: Dr. Soojin Yi School of Biological Sciences Georgia Institute of Technology Dr. Gregory Gibson School of Biological Sciences Georgia Institute of Technology Dr. Joseph Lachance School of Biological Sciences Georgia Institute of Technology Dr. Augusto Valderamma-Aguirre Faculty of Health Universidad Santiago de Cali Abstract Genetic diversity underpins much of observed human phenotypic diversity and plays an important role in human health and disease. This dissertation is focused on exploring the genetic architecture of phenotypic diversity among global populations and studying common complex disease in genetically diverse but geographically close communities. This work is motivated by prevalent health disparities that disproportionately affect disadvantaged populations across the world, and in particular, those in the Americas. I utilize thousands of genomes from diverse populations worldwide, along with hundreds of genome-wide association studies (GWAS) on thousands of human traits, to address three overarching questions: (1) which phenotypes vary among populations, and what explains that variance?; (2) is it possible to predict and stratify risk for common complex diseases across diverse populations?; and (3) can we apply already discovered genetic associations to risk prediction in new and ancestrally distinct populations? Polygenic risk scores (PGS) are increasingly used to quantify individuals' genetic predisposition for disease. I developed the first of its kind web platform for PGS computation and visualization, GADGET, The Global Distribution of Genetic Traits webserver (https://gadget.biosci.gatech.edu/). GADGET enables biomedical researchers to easily test hypotheses and generate publication-ready visualizations of PGS for thousands of individuals in 27 global populations. I also developed a specialized, country and population-specific PGS server, the Colombian Phenotype-Genotype Browser (CPGB; https://map.chocogen.com/), to support precision public health efforts in Colombia. Next, I leveraged the PGS curation from GADGET to explore the differentiation of single loci and polygenic traits between neighboring populations of Afro-Colombians in Choc√≥ and Euro-Colombians in Antioquia. I developed PGS and found that they largely reflect the observed health disparities for seven high-cost and high-burden common complex diseases in Colombia. Interestingly, PGS for type 2 diabetes (T2D) significantly over-predicted risk in Afro-Colombians. Further analysis of T2D in Colombia revealed the importance of environmental and lifestyle effects on T2D. In Colombia, in contrast to much of the developed world, low socioeconomic status was correlated with decreased prevalence for T2D. My final study brings the focus back to the US and developed a correction method for applying already ascertained SNP-trait associations, again for T2D, in diverse populations. I predicted T2D risk in Mexican-Americans and European-Americans and validated my predictions at the population level using epidemiological data. A simulation-based correction method utilizing the derived allele frequency spectrum for trait-associated variants was used to correct PGS bias between ancestrally divergent populations. Together, these studies underscore how genetic diversity contributes to global phenotypic variance. Differences in population PGS distributions are generally an accurate indicator of relative disparities between populations in a country; although, differences in ancestry impact the accuracy of individuals' PGS. In cases where predictions do not match observed disparities, there are significant socioeconomic and environmental effects that mediate the genetic component of disease risk. Finally, simulation-based controls showed promise for helping to account for and correct bias in PGS when transferring associations between populations with distinct ancestry.