Parisa Yousefi Zowj Thesis Defense
In partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics in the School of Industrial and Systems Engineering Parisa Yousefi Zowj Defends her thesis:Topics on Multiresolution Signal Processing and Bayesian Modeling with Applications in Bioinformatics Friday, December 11, 2020 12:30 - 2:00 pm ET, Meeting URL (for BlueJeans): https://bluejeans.com/704134011 Advisors: Dr. Brani Vidakovic, School of Industrial and Systems Engineering, Georgia Tech Dr. David Goldsman, School of Industrial and Systems Engineering, Georgia Tech Committee Members: Dr. Mirjana Milosevic Brockett, School of Biological Sciences, Georgia Tech Dr. King Jordan, School of Biological Sciences, Georgia Tech Dr. Jianjun Shi, School of Industrial and Systems Engineering, Georgia Tech Dr. Yajun Mei, School of Industrial and Systems Engineering, Georgia Tech Abstract: Analysis of multi-resolution signals and time-series data has wide applications in biology, medicine, engineering, etc. In many cases, the large-scale (low-frequency) features of a signal including basic descriptive statistics, trends, smoothed functional estimates, do not carry useful information about the phenomenon of interest. On the other hand, the study of small-scale (high-frequency) features that look like noise may be more informative even though extracting such informative features are not always straightforward. In this dissertation we try to address some of the issues pertaining to high-frequency features extraction and denoising of noisy signals. Another topic studied in this dissertation is focused on the integration of genome data with transatlantic voyage data of enslaved people from Africa to determine the ancestry origin of Afro-Americans. 1. Assessment of Scaling by Auto-Correlation Shells. In this chapter, we utilize the Auto-Correlation (AC) Shell to propose a feature extraction method that can effectively capture small-scale information of a signal. The AC Shell is a redundant shift-invariant and symmetric representation of the signal that is obtained by using Auto-Correlation function of compactly supported wavelets. The small-scale features are extracted by computing the energy of AC Shell coefficients at different levels of decomposition as well as the slope of the line fitted to these energy values using AC Shell spectra. We discuss the theoretical properties and verify them using extensive simulations. We compare the extracted features from AC Shell with those of Wavelets in terms of bias, variance, and mean square error (MSE). The results indicate that the AC Shell features tend to have smaller variance, hence more reliable. Moreover, to show its effectiveness, we validate our feature extraction method in the context of classification to identify patients with ovarian cancer through the analysis of their blood mass spectrum. For this study, we use the features extracted by AC Shell spectra along with a support vector machine classifier to distinguish control from cancer cases. 2. Bayesian Binary Regressions in Wavelet-based Function Estimation. Wavelet shrinkage has been widely used in nonparametric statistics and signal processing for a variety of purposes including denoising noisy signals and images, dimension reduction, and variable/feature selection. Although the traditional wavelet shrinkage methods are effective and popular, they have one major drawback. In these methods the shrinkage process only relies on the information of the coefficient being thresholded and the information contained in the neighboring coefficients is ignored. Similarly, the standard AC Shell denoising methods shrink the empirical coefficients independently, by comparing their magnitudes with a threshold value. The information of other coefficients has no influence on behavior of a particular coefficients. However, due to redundant representation of signals and coefficients obtained by AC Shells, the dependency of neighboring coefficients and the amount of shared information between them increases. Therefore, it would be vital to propose a new thresholding approach for AC Shells coefficients that considers the information of neighboring coefficients. In this chapter, we develop a new Bayesian denoising for AC Shell coefficients approach that integrates logistic regression, universal thresholding and Bayesian inference. We validate the proposed method using extensive simulations with various types of smooth and non-smooth signals. The results indicate that for all signal types including the neighbor coefficients would improve the denoising process, resulting in lower MSEs. Moreover, we applied our proposed methodology to a case study of denoising Atomic Force Microscopy (AFM) signals measuring the adhesion strength between two materials at the nano-newton scale to correctly identify the cantilever detachment point. 3. Bayesian Method in Combining Genetic and Historical Records of Transatlantic Slave Trade in the Americas. In the era between 1515 and 1865, more than 12 million people were enslaved and forced to move from Africa to North and Latin America. The shipping documents have recorded the origin and disembarkation of enslaved people. Traditionally, genealogy study has been done via the exploration of historical records, family tress and birth certificates. Due to recent advancements in the field of genetics, genealogy has been revolutionized and become more accurate. Although these methods can provide continental differentiation, they have poor spatial resolution that makes it hard to localize ancestry assignment as these markers are distributed across different sub-continental regions. To overcome the foregoing drawbacks, in this chapter, we propose a hybrid approach that combines the genetic markers results with the historical records of transatlantic voyage of enslaved people. Addition of the journey data can provide with substantially increased resolution in ancestry assignment, using a Bayesian modeling framework. The proposed Bayesian framework uses the voyage data from historical records available in the transatlantic slave trade database as prior probabilities and combine them with genetic markers of Afro-Americans, considered as the likelihood information to estimate the posterior (updated) probabilities of their ancestry assignments to geographical regions in Africa. We applied the proposed methodology to 60 Afro-American individuals and show that the prior information has increased the assignment probabilities obtained by the posterior distributions for some of the regions.