Keynote Speakers



Catherine Lozupone, Department of Medicine, University of Colorado Denver, Aurora, Colorado, USA










Chirag Jain 

Talk title: Revisiting string graph model for long-read assembly of genomes and metagenomes

Chirag Jain, Indian Institute of Science, Bangalore, India

Read-overlap-based graph data structures play a central role in computing de novo genome assembly using long reads. Most long-read assembly tools use the string graph model to sparsify overlap graphs. Graph sparsification is crucial for high-quality genome assembly as it simplifies the graph significantly by removing redundant edges. However, a graph model must be coverage-preserving, i.e., it must ensure that each haplotype can be spelled as a walk in the graph, given sufficient sequencing coverage. This property becomes even more important for polyploid genomes and metagenomes where there is a risk of losing haplotype-specific information during graph sparsification. In the first part, we prove that de Bruijn graph and overlap graph models are guaranteed to be coverage-preserving. However, using the same framework, we show that the commonly used string graph model lacks the guarantee. To address this, the second part of our work introduces a novel sparse read-overlap-based graph model that is well-supported by our theoretical results. Practical advantage of this model is demonstrated using CHM13 and HG002 human sequencing data.