This lesson has passed peer-review! See the publication in JOSE.

Metagenomics Workshop Overview: Data

Introduction to the dataset


This workshop uses data from the enviromental experiment: Genomic adaptations in information processing underpin trophic strategy in a whole-ecosystem nutrient enrichment experiment, by Jordan G Okie et al. 2020 In this research, authors compared the differences between the microbial community in its natural, oligotrophic, phosphorus-deficient environment, a pond from the Cuatro Ciénegas Basin (CCB), and the same microbial community under a fertilization treatment.

All of the data used in this workshop can be downloaded from DOI

Download data

The following commands download data from Zenodo into your computer. Type it in your command line.


Features of the dataset

The dataset in Zenodo contains two files:

The contains the following files. A tree structure showind directories and files                                                   contained in the compressed file

The directories are: .backup_dc_workshop,hidden, data, mags, and taxonomy.

Directory .backup_dc_workshop

Contains all the files produced or needed while runing the lesson.

Directory hidden

hidden contains a hidden file that will be used in the lesson Introduction to the Command Line for Metagenomics episode 03 Navigating Files and Directories when learners will discover how to find hidden files.

Directory data

data contains four fastq files from two samples: JC1A and JP4D. These files are the inputs of FastQC tool in the lesson Data Processing and Visualization for Metagenomics next episodes two and three Assessing Read Quality and Trimming and Filtering. In these episodes learners will remove bad quality nucleotides and prepare files for assembly and taxonomic assignation.

Directory mags

mags contains the assembly of the JP4D sample.

Directory taxonomy

Since Kraken2 won’t be run in the lesson, this directory contains taxonomic assignment obtained by running Kraken2 on the trimmed reads.
From these files users can obtained biom files that will be the input for the R analysis and visualization of abundance.

Finally, it contains a subdirectory with the taxonomic assignment of the first bin from sample JP4D.


Okie, J. G., Poret-Peterson, A. T., Lee, Z. M. P., Richter, A., Alcaraz, L. D., Eguiarte, L. E., Siefert, J. L., Souza, V., Dupont, C. L., & Elser, J. J. (2020). Genomic adaptations in information processing underpin trophic strategy in a whole-ecosystem nutrient enrichment experiment. ELife, 9.