See the schedule in the official ISMB page here.
Talk title: Methods for integration and hypothesis generation from high dimensional biomedical microbiome datasets
Catherine Lozupone, Department of Medicine, University of Colorado Denver, Aurora, Colorado, USA
Differences in microbiome composition have been described to occur in a variety of disease contexts, but a major challenge is to understand the mechanisms that may drive these relationships. Microbes may influence disease through cellular components or metabolites that they produce that in turn interact with the host, such as through influencing the immune system. These relationships might further be influenced by environmental or demographic characteristics such as diet or race/ethnicity. Complex datasets that we have been working with to predict relationships between the microbiome and disease have included in depth information on the microbiome composition and activity, diet, immune status (e.g. extensive cytokine panels and data on cell populations using time of flight mass cytometry (CyTOF)), metabolome, and demographics. I will discuss tools that we have been developing in this context for feature reduction of microbiome data (SCNIC: Sparse Correlation Network Investigation for Compositional Data), and finding and visualizing relationships between microbes and other complex data types using linear regression (VOLARE: visual analysis of disease-associated microbiome-immune system interplay). Once correlative relationships have been determined, a further goal is to use existing knowledge to generate hypotheses regarding their underlying basis. I will discuss our approach to exploring microbiome:metabolite relationships using metabolic networks (AMON: Analysis of Metabolite Origins Using networks), and broader knowledge-bases. Taken together these approaches should enable the integration of complex data to generate hypotheses worthy of further experimental validation.
Talk title: Revisiting string graph model for long-read assembly of genomes and metagenomes
Chirag Jain, Indian Institute of Science, Bangalore, India
Read-overlap-based graph data structures play a central role in computing de novo genome assembly using long reads. Most long-read assembly tools use the string graph model to sparsify overlap graphs. Graph sparsification is crucial for high-quality genome assembly as it simplifies the graph significantly by removing redundant edges. However, a graph model must be coverage-preserving, i.e., it must ensure that each haplotype can be spelled as a walk in the graph, given sufficient sequencing coverage. This property becomes even more important for polyploid genomes and metagenomes where there is a risk of losing haplotype-specific information during graph sparsification. In the first part, we prove that de Bruijn graph and overlap graph models are guaranteed to be coverage-preserving. However, using the same framework, we show that the commonly used string graph model lacks the guarantee. To address this, the second part of our work introduces a novel sparse read-overlap-based graph model that is well-supported by our theoretical results. Practical advantage of this model is demonstrated using CHM13 and HG002 human sequencing data.