This lesson has passed peer-review! See the publication in JOSE.

Metagenomics Workshop Overview

Data Carpentry aims to teach researchers basic concepts, skills, and tools for working with data so they can get more done quickly and with less pain. This workshop uses Data Carpentry’s approach to teach data management and analysis for metagenomics research, including: best practices for the organization of bioinformatics projects and data, use of command-line utilities, use of command-line tools to analyze sequence quality, use of R studio and use of R libraries to compare diversity between samples, and connecting to and using cloud computing. This workshop is designed to be taught over two full days of instruction.

Would you be interested in teaching these materials? We have an Slack channel were we will be happy to help you!

Frequently Asked Questions

Read our FAQ to learn more about Data Carpentry’s Metagenomics workshop as an Instructor or a workshop host.

Getting Started

This lesson assumes that learners have no prior experience with the tools covered in the workshop.

However, learners are expected to have some familiarity with biological concepts, including the concept of DNA sequencing, nucleotide abbreviations, genome, microbiome, and taxonomy. Participants should bring their own laptops and plan to participate actively.

To get started, follow the directions in the Setup tab to get access to the required software and data for this workshop.


This workshop uses data from the environmental experiment: Genomic adaptations in information processing underpins trophic strategy in a whole-ecosystem nutrient enrichment experiment, by Jordan G Okie et al. In this research, authors compared the differences between the microbial community in its natural, oligotrophic, phosphorus-deficient environment, a pond from the Cuatro Ciénegas Basin (CCB), and the same microbial community under a fertilization treatment.

All of the data used in this workshop can be downloaded from DOI More information about this data is available on the Data page.

Workshop Overview

Lesson Overview Estimated time
Project Organization and Management Learn how to structure your metadata, organize and document your metagenomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database. 1:30 hr
Introduction to the Command Line Learn to navigate your file system, create, copy, move, and remove files and directories, and automate repetitive tasks using scripts and wildcards. 4:00 hr
Introduction to R Use R studio to manage several data types and structures. 1:00 hr
Data Processing and Visualization for Metagenomics Use command-line tools to perform quality control, metagenomic assembly, metagenomic binning, taxonomic assignment, and diversity exploration. 6:30 hr

Lessons Reference

The content of this page and three of the lessons presented in this workshop are adapted from lessons on the Data Carpentry Genomics Workshop.

Teaching Platform

This workshop is designed to be run on pre-imaged Amazon Web Services (AWS) instances. All the software and data used in the workshop are hosted on an Amazon Machine Image (AMI). If you want to run your instance of the server used for this workshop, follow the directions in the Setup tab.


Please cite as: DOI

Claudia Zirión Martínez; Diego Garfias Gallegos; Tania Vanessa Arellano Fernández; Aarón Espinosa Jaime; Edder D Bustos Díaz; José Abel Lovaco Flores; Luis Gerardo Tejero Gómez; J Abraham Avelar Rivas; Nelly Sélem (March , 2023) A Data Carpentry- Style Metagenomics Workshop