Interpretable Genomic Clustering to Find Phenotypic and Lifestyle Cohorts Among Ancestry Specific Populations vs General Population

Background 
Autoimmune conditions occur when one’s immune system starts recognizing their own body’s cells as foreign and begins attacking these healthy cells [14]. The exact reason behind autoimmune conditions is unknown, but it is thought to be a mix of both environmental and genetic factors. The genetic factors to most autoimmune conditions are poorly understood, as most autoimmune conditions are thought to be polygenic diseases, with their genetic susceptibilities being theorized to be different combinations of hundreds or thousands of alleles [2]. 

The Major Histocompatibility Complex (MHC) of the human genome, a vast region of over 8 thousand single nucleotide polymorphisms (SNPs) on chromosome 6 which encodes both cell surface proteins essential for cell recognition and the adaptive immune response in all vertebrate species, is thought to host many candidate genes for factors causing autoimmune disease [3]. Due to the polygenic nature of autoimmune conditions, many complex allelic interactions contribute to developing these conditions [3]. Machine learning techniques allow us to identify and search for nonlinear relationships among these many complex interactions in our genome [4], and these techniques can be used to robustly find different patterns in different minority cohorts. Thus, machine learning is extremely helpful in exploring the genetic underpinnings of autoimmune conditions and finding better associations between these autoimmune conditions and both genetic and environmental conditions.

Student Name
Jain, Neha
Faculty Mentor
May Wang