CAMI II Challenge Information
We proudly announce the beginning of the second round of challenges of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI) and release of the official challenge data sets!
Over the last two years, we received valuable feedback from the community on important challenges in the field and how to design interesting new data sets and challenges. We incorporated many of your suggestions, thanks again! For you to familiarize with data set types and formats, additional exemplary data sets together with accompanying standards of truth have already been made available over the last months. Two multisample “toy” data sets representing microbial communities from different human body sites and from mouse gut are already provided to allow participants to prepare for the challenges (https://data.cami-challenge.org/participate). These practice data sets are generated from known genomes, and therefore reference-based methods (e.g., using genome databases for their analysis) might perform better here than for real shotgun metagenomic data, where a substantial portion of microbial community members have not been sequenced.
The second CAMI challenge datasets will therefore again include new genomes from taxa (at different evolutionary distances) not found in public databases. Furthermore, a new focus will be on establishing the value of long sequencing reads for microbiome research, with data sets providing both long- and short-read data. Lastly, a clinical pathogen discovery challenge will be offered, mimicking an emergency diagnostic situation in the clinic.
Specifically, the second round of CAMI challenges comprise a metagenome assembly, a genome binning, a taxonomic binning and a taxonomic profiling challenge, across several multi-sample data sets from different environments. This includes a marine data set (ended), a high-strain diversity data set (ended) and a clinical pathogen detection challenge (ended). A new round of challenges on a rhizosphere data set has just started in early 2020!
We are looking forward to receiving your submissions!
The CAMI Team
CAMI II offers several challenges: an assembly, a genome binning, a taxonomic binning and a taxonomic profiling challenge, on several multi-sample data sets from different environments, including long and short read data. This includes a marine data set, a high-strain diversity data set, and a rhizosphere data set. A pathogen detection challenge on a clinical sample is also offered.
Assembly challenge: takes as input read samples of a given data set, and returns a cross-sample assembly or single sample assemblies. Assembly results can be submitted for short read data OR long read data, OR both data types combined. For methods incapable of submitting a cross-sample assembly for the entire data set, the FIRST TEN samples of a data set can be assembled and a ten-sample cross-assembly submitted. Participants can also submit single-sample assemblies for each of the first five samples of a data set. The assembly challenge will close early, namely once a gold standard assembly has been released after 4.5 months! Details of the specifications of the CAMI evaulation for strain-aware assemblers can be found at https://www.microbiome-cosi.org/images/Specification_of_CAMI_evaluation_for_strain-aware_assemblers.pdf
Profiling challenge: takes as input multiple read samples of a given data set and returns taxonomic profiles for all individual samples and one for the entire data set. This challenge closes after the second challenge period has ended.
Genome binning challenge: takes as input reads, or gold standard assemblies, or assemblies provided by CAMI after three months for every sample individually. It returns genome bin assignment for the analysed reads or contigs for every sample of a data set in the CAMI format.
Taxon binning challenge: takes as input reads, or gold standard assemblies, or assemblies provided by CAMI after three months. It returns a taxon bin assignment for the analysed reads or contigs in CAMI format for every sample in a data set.
Clinical pathogen detection challenge
Case description: A 32-year-old woman presented to an Emergency Center on March 22nd 2018 because of vomiting, abdominal pain and strong nosebleeds. She claimed to feel well until 5 days prior to admission when she began to develop fever, joint pain and muscle pain. Four days before admission she presented to her general practitioner and was diagnosed with influenza-like illness. One day before admission, her state rapidly deteriorated with onset of intense abdominal pain, followed by vomiting and nosebleeds, prompting her to present to the Emergency Center. She was never hospitalized for any medical illness. She denied any recent trauma. Four days prior to onset of symptoms, she had returned from a one-month hiking trip between Fethiye and Antalya in Turkey. She denied any unusual contact with wildlife or eating raw meats during her trip. The hospital has sent you a nasal swab for sequencing in order to identify the causative agent. You have generated a paired-end MiSeq sequence sample from this for further analysis.The results of classical molecular tests are still pending.
Expected submissions:: A list of NCBI taxonomy IDs (plain text file called taxa.txt, a single line containing a tab-separated list of taxonomy IDs) of pathogens found in the sample and a single taxonomy ID (plain text file called pathogen.txt, containing only a single taxonomy ID) of the pathogen responsible for the symptoms. Use the “Taxonomy database CAMI 2 Toy” available from http:/cami-challenge.org/participate
for selecting the appropriate taxonomy IDs.
The first challenges - metagenome assembly, taxonomic profiling, taxonomic or genome binning of raw read data - start on January 16th, 2019. For taxonomic profiling, taxonomic or genome binning methods using assembled data, assemblies will be provided on May 18th, 2019. The assembly challenge will close on May 17th, 2019. All other challenges close on October 25th, 2019.
The clinical pathogen detection challenge opens on June 3rd and closes on October 25th, 2019.
Timeline CAMI IIb challenges
The short and long read challenges on the rhizosphere data set - metagenome assembly, taxonomic profiling, taxonomic or genome binning of raw read data - start on February 14th, 2020. For taxonomic profiling, taxonomic or genome binning methods using assembled data, assemblies will be provided on September 30th, 2020. The assembly challenge will close on September 29th, 2020. All CAMI IIb challenges close on January 31st, 2021.
To get an alert when contests start, you can follow @CAMI_challenge on Twitter.
How to submit
Detailed submission instructions can be found here.
We will announce when result submission to the CAMI platform will open. We are looking forward to receiving your reproducible results! CAMI will represent all submitted results in anonymous form and indicate their performance using a range of metrics in comparisons to other tools. All submitted results should best be reproducible by providing the software, with the exact database versions and parameter settings used.
Tools can be submitted in one of the following ways:
- Docker container containing the complete tool/workflow
- Bioconda script
- Software repository including detailed installation instructions
The output format must conform to the CAMI standards (FAQ) to allow automatic benchmarking of results. For software using large custom databases, please contact the CAMI team before providing it.
The next steps
We wish to include all participants as co-authors on the joint CAMI publication, given their consent and that their results are reproducible. We also cordially invite you to the CAMI evaluation meeting in Brunswick, Germany (March 12/13th) and the CAMI session at the Microbiome COSI session at ISMB in Montreal, Canada in 2020. Mark the date and follow COSI updates if you wish to submit your related work for a short talk or poster presentation or contact us if you would like to work with us on the final evaluation metrics and become part of the CAMI team.
Please fill in and submit the following form in order to get access to the CAMI 2 datasets. We will send you an Email with download instructions.