On May 7, 2018, Cai Huang defended his PhD thesis,"k-mer based data structures and heuristics for microbes and cancer." Committee members: Dr. Fredrik Vannberg, Advisor, School of Biological Sciences, Georgia Institute of Technology Dr. John McDonald, School of Biological Sciences, Georgia Institute of Technology Dr. King Jordan, School of Biological Sciences, Georgia Institute of Technology Dr. Robert Dickson, School of Chemistry & Biochemistry, Georgia Institute of Technology Dr. Jung H. Choi, School of Biological Sciences, Georgia Institute of Technology Abstract: Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the"big data" era and the availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. Machine learning algorithms have shown their power of increasing efficiency and accuracy in bioinformatics analysis but not all of these are open source. This dissertation presents a broad platform of open source tools to perform a variety of different genomic analyses and we include highlights such as un-supervised genomic clustering of microbes and supervised clustering of cancer patient drug response.