Metataxonomics, metagenomics, metatranscriptomics We begin with a review of terminology and a comparison of marker gene sequencing, shotgun metagenome sequencing and meta-transcriptome sequencing, all of which are sometimes included in the term metagenomics. This review discusses the computational challenges of analyzing metagenomics data, focusing on methods but also including a discussion of microbial taxonomy and genome resources, which are rarely discussed in benchmark studies and tool reviews despite their critical importance. As we discuss below, the variable quality of these genomes can lead to unexpected and erroneous results if the genomes are used without careful vetting. Additional challenges arise from the rapid pace of ‘draft’ genome sequencing, which has produced tens of thousands of new genomes, many of which are highly fragmented and incomplete. The rapid increase in the number and variety of genomes also present many challenges, rising in part from the effort required to fit traditional taxonomic naming schemes onto a microbial world that we now know is vastly richer and more complex than scientists realized when they first created taxonomic naming schemes in the distant past. Ever-larger data sets present increasing challenges for computational methods, which must minimize processing and memory requirements to provide fast turnaround and to avoid overwhelming the computational resources available to most research laboratories. As the variety and complexity of experiments has grown, so have the methods and databases used to analyze these experiments.
Microbiome research has been expanding rapidly as a consequence of dramatic improvements in the efficiency of genome sequencing.