Reproducible Bioinformatics Through the Lens of Ancestry Inference

Background 
In the last couple of decades, a reproducibility crisis has emerged in science. Many recent scientific studies are nearly impossible to reproduce, causing their results to be less credible. This reproducibility problem has extended to bioinformatics as well. Genetic ancestry refers to the population groups that their genes are derived from and is objective in nature. These population groups are inferred from the person’s sequenced genome. In this study, I will perform large-scale ancestry inference on a sample dataset and create a comprehensive guide for anybody learning the process to ensure reproducibility of the methods. 

The ultimate goal is to apply this method to several datasets which are a part of CODIGO (https://codigo.biosci.gatech.edu/), the Consortium for Genomic Diversity, Ancestry and Health in Colombia. This is a collaborative project with the National Institutes of Health (NIH) that is centered around increasing the wealth and diversity of genetic information around admixed genomes and support bioinformatics research in Colombia. The Colombian population is a result of an admixture between Europeans, Native Americans, and Africans. In addition, their ancestry is heavily dependent on where they are from in Colombia geographically. This creates a fascinating amount of diversity within the country and its population that is not yet fully understood [1-4]. 

Student Name
Menuey, Jay Landon
Faculty Mentor
King Jordan