Short for ‘environmental DNA’. Refers to DNA deposited in the environment through excretion, shedding, mucous secretions, saliva etc. This can be collected in environmental samples (e.g. water, sediment) and used to identify the organisms that it originated from. eDNA in water is broken down by environmental processes over a period of days to weeks. It can travel some distance from the point at which it was released from the organism, particularly in running water. eDNA in soil can bind to organic particles and persist for a very long time (sometimes hundreds or thousands of years). eDNA is sampled in low concentrations and can be degraded (i.e. broken into short fragments), which limits the analysis options.
Refers to DNA sampled directly from the organism through whole organism collection (e.g. invertebrates), swabbing, blood sampling, clipping etc. Usually high concentration and non-degraded. The location of the organism at the time of sampling is definitively known. Overall there are fewer uncertainties than for eDNA.
Refers to DNA extracted from a mixture of different organisms. Could be eDNA (environmental samples almost always contain DNA from a mixture of species) or organismal DNA (e.g. homogenised insect trap samples).
Polymerase chain reaction. A process by which millions of copies of a particular DNA segment are produced through a series of heating and cooling steps. Known as an ‘amplification’ process. One of the most common processes in molecular biology and a precursor to most sequencing-based analyses.
Short sections of synthesised DNA that bind to either end of the DNA segment to be amplified by PCR. Can be designed to be totally specific to a particular species (so that only that species’ DNA will be amplified from a community DNA sample), or to be very general so that a wide range of species’ DNA will be amplified. Good design of primers is one of the critical factors in DNA-based monitoring.
Stands for ‘quantitative PCR’, sometimes also known as ‘real-time PCR’. A PCR reaction incorporating a coloured dye that fluoresces during amplification, allowing a machine to track the progress of the reaction. Often used with species-specific Primers where detection of amplification is used to infer presence of the target species’ DNA in the sample. If the species is not present in the sample, no fluorescence will be detected. The high specificity of the qPCR method makes it ideal for situations where a single target is required. The most common use of qPCR testing is for detection of Great Crested Newts from water samples.
Traditional DNA sequencing. Each reaction produces a single sequence so it only works on amplified DNA of a single species. A sequence is a series of nucleotide bases represented by the letters A, T, C & G. Here is the sequence of part of the 12S gene for a minnow (Phoxinus phoxinus): CACCGCGGTTAAACGAGAGGCCCTAGTTAATAATTGACGGCGTAAAGGGTGGTTAGGGGGTGTAATGTAATAAAGCCGAATGGCCCTTTGGCTGTC ATACGCTTCTAGGTGTCCGAAGCCCAACATACGAAAGTAGCTTTAAGAAAGTCCACCTGACGCCACGAAAACTGAGAAA
Technology developed in the 2000s that produces millions of sequences in parallel. Enables thousands of different organisms from a mixture of species to be sequenced at once, so community DNA can be sequenced. Various different technologies exist to do this, but the most commonly used platform is Illumina’s MiSeq. Also known as Next-Generation Sequencing (NGS) or parallel sequencing.
Refers to genes that can be used for species identifications. Different regions of DNA mutate at different speeds. Fast-changing regions are useful for population studies and paternity testing, while the most stable regions can be used for assessing deep evolutionary relationships between groups of organisms. Certain regions change at just the right rate to be stable within a species but different between species. These are known as barcode genes. The official barcode gene for animals is Cytochrome Oxidase 1 (COI or cox-1). Other genes used as animal barcodes include 12S, 16S, 18S and Cytochrome-b (cytb). For plants, the most commonly used genes are MatK, rbcL, trnL and ITS.
Refers to identification of species assemblages from community DNA using barcode Genes. PCR is carried out with non-specific primers, followed by high-throughput sequencing and bioinformatics processing. Can identify hundreds of species in each sample, and 100+ different samples can be processed in parallel to reduce sequencing cost. 'Read more about metabarcoding by clicking here'
Refers to libraries of DNA sequences (usually from barcode genes) that have been generated from species of known identity. Sequences from unidentified organisms – obtained either by Sanger sequencing or high-throughput sequencing – are compared against a reference database to make species identifications. Databases can be curated (e.g. the Barcode of Life Database – BOLD – www.boldsystems.org) or uncurated (e.g. Genbank – www.ncbi.nlm.nih.gov). In curated databases, identifications are scrutinised and verified; in uncurated databases they are not. GenBank is therefore far more extensive than BOLD, but contains many errors.
Refers to a data processing pipeline that takes the raw sequence data from high-throughput sequencing (often 20 million sequences or more) and transforms it into usable ecological data. Key steps for metabarcoding pipelines include quality filtering, trimming, merging paired ends, removal of sequencing errors such as chimeras, clustering of similar sequences into molecular taxonomic units (each of which approximately represents a species), and matching one sequence from each cluster against a reference database. The output is a species-by-sample table showing how many sequences from each sample were identified as each species.