Key Points
Starting a Metagenomics Project |
|
Assessing Read Quality |
|
Trimming and Filtering |
|
Metagenome Assembly |
|
Metagenome Binning |
|
Taxonomic Assignment |
|
Exploring Taxonomy with R |
|
Diversity Tackled With R |
|
Taxonomic Analysis with R |
|
Other Resources |
|
Glossary - Data processing and visualization for metagenomics
- adapters
- Artificial sequences of small length that are attached to both ends of a biological sequence for methodological purposes.
- Alpha diversity (α-diversity)
- mean species diversity in a site at a local scale
- Assembly (Metagenomics)
- stitching together of individual DNA reads into more complex and complete objects (contig, scaffold), which could lead to the complete representation of a gene or an entire genome.
- Beta diversity (β-diversity)
- the extent of change in community composition, or degree of community differentiation, in relation to a complex-gradient of environment, or a pattern of environments
- bin
- Group of reads, contigs, or scaffolds hypotetically assigned to a individual genome.
- binning
- The process of agruping DNA sequences in accordance to intrinsic chacarteristics of the sequence.
- contig
- contiguous fragments of DNA sequence from an incomplete draft genome. The result of assembling reads
- Envirnomnet (conda)
- Is a directory that contains a specific collection of packages that the user installed
- fasta (format)
- A text-based format for representing biological sequences.
- fastq
- A file storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores
- for loop
- A loop that is executed once for each value in some kind of set, list, or range. See also: while loop.
- GC-content
- is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C).
- gene
- A sequence of nucleotides that contains the information to specify a trait.
- genome
- All genetic information of an organism.
- Illumina (sequencing)
- A technique used to determine the series of base pairs in DNA.
- Lowest common ancestor (LCA)
- is the lowest node that has all descendants of insterest in a tree.
- k-mer
- Are contiguous sequence of characters of length k contained within a biological sequence.
- Mapping
- The process of establishing the locations of a set of nucleotides on any set of biological information as reads.
- Metabarcoding
- Collection of a specific gene region of a set of organisms.
- Metadata
- Information concerning how the samples and data were treated.
- Metagenome-Assembled Genomes (MAG)
- A single-taxon assembly based on one or more binned metagenomes that has been asserted to be a close representation to an actual individual genome
- Metagenomics (shotgun metagenomics)
- collection of genomic sequences from various (micro)organisms that coexist in any given space.
- Next generation sequencing (NGS)
- Technology is used to determine the order of nucleotides in entire genomes or targeted regions of DNA or RNA that is characterized by its massively parallel processing.
- Operational Taxonomic Unit (OTU)
- A collection of sequences that have certain percentage of similarity and are thus classified into groups of closely related individuals.
- quality control
- any process which removes problematic data from a dataset
- quality (Phred) scores
- Is an integer value representing the estimated probability of an error, i.e. that the base is incorrect
- read(s)
- DNA sequence from one fragment (a small section of DNA).
- read quality
- the assignation of the probability of an error in the sequencing of a determined read
- sequencing (genomics)
- the process of determining the nucleic acid sequence – the order of nucleotides in DNA
- Species diversity
- The number of different species that are represented in a given community.
- taxonomic assignment
- Method of determining that a specific sequence belongs to a recognized taxon at different levels of the classification of all life organisms (Phylum, Genus, and Species). This is usually done by comparing the sequence of interest against a set of reference sequences.
- thread
- A thread is the unit of execution within a process. A process (the execution of a program) can have anywhere from just one thread to many threads.
- Oligotrophic (environment)
- A space that offers low levels of nutrients.
- PCR (polymerase chain reaction)
- method used to rapidly make millions of copies of a specific DNA sequence.
- rRNA (Ribosomal ribonucleic acid)
- a type of non-coding RNA which is the primary component of ribosomes.
- scaffold
- A portion of the genome sequence reconstructed from sequence fragments. Scaffolds are composed of contigs and gaps.
- Sequencing depth (coverage)
- Is the number of unique reads that include a given nucleotide in the reconstructed sequence.
- Species abundance
- The number of individuals of each species inside the environment.
- Species richness:
- Number of different species in an environment.
- while loop
- A loop that keeps executing as long as some condition is true. See also: for loop.