Revealing Evolutionary Features of Genetic Variants that Replicate Well Across Ancestries with Machine Learning Model

Background 
Since genome-wide association study (GWAS) became available, researchers have identified numerous associations between human disease and genetic variants. It allows us to understand how human genetic traits are related to complex diseases and to identify novel genes related to specific diseases [1]. However, there is a lack of diversity in human genetics studies. Many of them, especially GWAS studies, focus on European populations. Some traits share similar genetic effects on phenotypes regardless of people’s ancestral heritage. For example, skin color is highly heritable and has several replicating SNPs. Fifty-nine pigmentation-associated SNPs were identified in both Africans and Europeans [3]. On the other hand, other traits might have genetic effects on specific traits depending on ancestry groups. With these traits, the application of European-biased results to the non-European population will produce less accurate predictions [2]. In dealing with this limitation, we would like to see if there is replicating single nucleotide polymorphism (SNP) across populations and apprehend their functional and evolutionary features. Also, we would like to build the database with annotation of the variants. A machine learning method, then, will be generated to predict the likelihood of getting phenotypes when the information is given. This study can bring more insight into genetic patterns on phenotypes by ancestry group, allowing further studies to make more accurate predictions.

Student Name
Choi, Jiyeong
Faculty Mentor
Joe Lachance