Key Points
Data Tidiness |
|
Planning for NGS Projects |
|
Examining Data on the NCBI SRA Database |
|
Glossary
- accession
- a unique identifier assigned to each sequence or set of sequences
- BLAST
- The Basic Local Alignment Search Tool at NCBI that searches for similarities between known and unknown biomolecules like DNA
- categorical variable
- Variables can be classified as categorical (aka, qualitative) or quantitative (aka, numerical). Categorical variables take on a fixed number of values that are names or labels.
- cleaned data
- data that has been manipulated post-collection to remove errors or inaccuracies, introduce desired formatting changes, or otherwise prepare the data for analysis
- conditional formatting
- formatting that is applied to a specific cell or range of cells depending on a set of criteria
- CSV (comma separated values) format
- a plain text file format in which values are separated by commas
- factor
- a variable that takes on a limited number of possible values (i.e. categorical data)
- Gb
- gigabyte of file storage or file size
- Gbase
- a gigabase represents one billion nucleic acid bases (Gbp may indicate one billion base pairs of nucleic acid)
- headers
- names at tops of columns that are descriptive about the column contents (sometimes optional)
- metadata
- data which describes other data
- NGS
- common acronym for “Next Generation Sequencing” currently being replaced by “High Throughput Sequencing”
- null value
- a value used to record observations missing from a dataset
- observation
- a single measurement or record of the object being recorded (e.g. the weight of a particular mouse)
- plain text
- unformatted text
- quality assurance
- any process which checks data for validity during entry
- quality control
- any process which removes problematic data from a dataset
- raw data
- data that has not been manipulated and represents actual recorded values
- rich text
- formatted text (e.g. text that appears bolded, colored or italicized)
- string
- a collection of characters (e.g. “thisisastring”)
- TSV (tab separated values) format
- a plain text file format in which values are separated by tabs
- variable
- a category of data being collected on the object being recorded (e.g. a mouse’s weight)
Reference
This page is adapted from the Project Organization and Management for Genomics corresponding page.