In partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Bioinformatics
in the Department of Biomedical Engineering
Zijun Wu
(Advisor: Dr. Saurabh Sinha)
will defend a doctoral thesis entitled,
Computational methods for gene regulatory network inference
On
Thursday, July 17 at 12:00 p.m.
https://gatech.zoom.us/j/94730139585
Committee
- Dr. Saurabh Sinha - Department of Biomedical Engineering, Georgia Institute of Technology
- Dr. Hanjoon Jo - Department of Biomedical Engineering, Georgia Institute of Technology and Emory University
- Dr. Greg Gibson - School of Biological Sciences, Georgia Institute of Technology
- Dr. Peng Qiu - Department of Biomedical Engineering, Georgia Institute of Technology
- Dr. Ahmet Coskun - Department of Biomedical Engineering, Georgia Institute of Technology
Abstract
Gene regulatory networks (GRNs) are commonly used to describe the complex regulatory relationships of transcription factors (TFs) and target genes. These networks are essential blueprints for a vast array of biological processes, including cellular development, response to stimuli, and disease progression. Despite their importance, accurate inference of GRNs from high-throughput data remains a fundamental challenge in bioinformatics due to dynamic and context-specific nature of cellular programs and the inherent noise in multi-modality, high-dimensional biological data.
This thesis presents a combination of computational approaches for GRN inference that address different aspects of these challenges. In the first project, we develop a novel simulator-supervised neural network framework, SPREd, for GRN reconstruction from transcriptomic data. This algorithm development project leverages synthetic data generation through a biophysics-based simulation model, allowing for training deep neural networks that directly predict the relationships between TFs and target genes. The approach we develop here offers an alternative to the established paradigm of training multi-variable models to predict a gene’s expression from the levels of TFs. Our simulator-supervised learning strategy addresses the common limitation of insufficient ground truth data in GRN inference by creating large-scale synthetic datasets that capture realistic regulatory dynamics. We test SPREd on diverse synthetic and real data sets, demonstrating its improved accuracy in identifying both direct and indirect regulatory relationships compared to other GRN inference models.
While the SPREd model enables us to learn the underlying regulatory logic from bulk gene expression profiles alone, the second project in this thesis focuses on the reconstruction of GRNs from single-cell multi-omics data in disease contexts. Specifically, we aim to identify key regulators driving atherogenesis through integrative analysis of scRNA-seq and scATAC-seq data from multiple time points and experimental conditions. We use a combination of cis-regulatory analysis and coexpression-based GRN inference to identify TFs that regulate atherogenesis-linked transcriptomic shifts in endothelial cells. We also develop a new method that incorporates cis-regulatory evidence as prior information in a neural network model of gene expression, which subsequently yields GRN edges through explainable machine learning techniques. We present a novel strategy for visualization of regulatory profiles of genes, by using supervised learning to map each gene to a low dimensional embedding, allowing a global view of all genes that captures their differential expression patterns and regulatory evidence.
Our systematic analysis provides insights into the regulatory programs controlling flow-induced reprogramming of endothelial cells during the development of atherosclerosis. We predict the TFs Creb3l2, Rela, and Mef2c to coordinate the transition from proatherogenic endothelial cells to pathological states, and test these predictions in vitro through collaboration.
Together, this thesis demonstrates that combining principled computational innovation with biological applications can overcome fundamental limitations in GRN inference, establishing a framework for more accurate regulatory network reconstruction and mechanistic understanding of disease processes.
