In partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics in the School of Biological Sciences Yuanbo (Cody) Wang Defended his thesis: Building a Systematic Analytic Pipeline - Big Data Innovation in Healthcare Monday, August 5th, 2019 1:00 PM Eastern Time BME/Whitaker Building, Room 1103 Thesis Advisor: Dr. Eva Lee School of Industrial and Systems Engineering Georgia Institute of Technology Committee Members: Dr. King Jordan School of Biological Sciences Georgia Institute of Technology Dr. Fredrik Vannberg School of Biological Sciences Georgia Institute of Technology Dr. Yajun Mei School of Industrial and Systems Engineering Georgia Institute of Technology Dr. Alfred Merrill School of Biological Sciences Georgia Institute of Technology Dr. Shatavia Morrison Division of Bacterial Diseases Centers for Disease Control and Prevention Abstract: Data-driven healthcare utilizing big data in Electronic Health Records (EHR) has the potential to revolutionize care delivery while reducing costs. However, for researchers, policymakers, and practitioners to take full advantage of the benefits that EHR can provide, several challenges must be addressed: 1) Extraction and coding methods for EHR data must be strategically designed to address issues of data quantity, quality, and patient confidentiality; 2) Standardization of clinical terminologies is essential in facilitating interoperability among EHR systems and allows for multi-site comparative effectiveness studies; 3) Effective methods for mining longitudinal health data common in the EHR are critical for revealing disease progression, treatment patterns, and patient similarities, each of which plays an important role in clinical decision support and treatment improvement; 4) Advanced machine learning techniques are necessary for early detection and prognosis of disease and identifying critical factors that impact patient outcome and; 5) Practical intervention strategies must be developed to address healthcare disparity in rural and remote areas with lack of resources and access. My thesis focuses on these five issues by developing a systematic analytic pipeline for big data in healthcare. Specifically, innovative strategies are developed for information extraction, clinical terminology mapping, time-series mining and clustering, feature selection and discriminatory modeling. Finally, practical implementation methods for telehealth services are designed to reduce healthcare disparity in underserved rural Appalachian counties in Georgia.