Denoising (Illumina only)¶
Usually, amplicon sequences are clustered into Operational Taxonomic Units (OTUs) using a similarity threshold of 97%, which represents the common working definition of bacterial species.
Another approach consists to identify the Sequence Variants (SVs, see OTU picking and Denoising for details). This approach avoids clustering sequences at a predefined similarity threshold and usually includes a denoising algorithm in order to identify SVs.
In this tutorial we show how to perform the denoising of Illumina overlapping paired-end sequences in order to detect the SVs. Athough this tutorial explains how to apply the pipeline to 16S paired-end Illumina reads, it can be adapted to Illumina single-end sequening or to others markers gene/spacers, e.g. Internal Transcribed Spacer (ITS), 18S or 28S.
Table of Contents
wget ftp://ftp.fmach.it/metagenomics/micca/examples/garda.tar.gz tar -zxvf garda.tar.gz cd garda micca mergepairs -i fastq/*_R1*.fastq -o merged.fastq -l 100 -d 30 micca trim -i merged.fastq -o trimmed.fastq -w CCTACGGGNGGCWGCAG -r GACTACNVGGGTWTCTAATCC -W -R -c micca filter -i trimmed.fastq -o filtered.fasta -e 0.75 -m 400
The otu command implements the UNOISE3 protocol
denovo_unoise) which includes dereplication, denoising and chimera
micca otu -m denovo_unoise -i filtered.fasta -o denovo_unoise_otus -t 4 -c
The otu command returns several files in the output directory,
including the SV table (
otutable.txt) and a FASTA file containing the
representative sequences (
See OTU picking and Denoising to see how to apply the de novo swarm, closed-reference and the open-reference OTU picking strategies to these data.