I am interested in the design and application of efficient algorithms for the assembly, alignment, and analysis of massive genomic sequencing data. My current research focuses on long-read sequencing and assembly, rapid nucleic acid diagnostics, and microbial forensics.
Abbreviated list of my research interests:
- DNA sequencing
- Whole-genome alignment
- Sequence assembly and validation
- Microbial genomics and forensics
The rapid growth of genomic data has begun to outpace traditional methods for sequence clustering and search. As the cost of sequencing approaches zero, new methods are needed to processes the coming torrent of data. To address this, I will discuss applications of MinHash locality-sensitive hashing for dimensionality reduction and rapid approximation of genomic and metagenomic distances. This technique can be applied to any problem that requires a fast distance approximation, e.g. to triage and cluster sequence data, assign species labels to unknown genomes, quickly identify mis-tracked samples, and search massive genomic databases. I will discuss specifically the applications of rapid pathogen triage and metagenomic clustering.