Reference Database Files
RefSSU, the primary reference database of near full-length reference sequences, is derived from the SILVA rRNA database project (version 119). Low quality sequences are flagged as deleted (pintail score <40, sequence quality <50 or alignment quality <50). Taxonomic sources include Entrez genomes, the RDP, SILVA, EMBL, and hand-curation by collaborators. A RefSSU_ID is assigned to each sequence and is formed from the accession number and the start and stop locations of the sequence on the rRNA gene.
Individual reference files for specific hypervariable regions (RefHVR_v3, RefHVR _v6, RefHVR _v9) are created by excising in-silico the appropriate section of the full-length sequences. Only sequences that cover the entire hypervariable region are included. A RefHVR_ID is assigned to each unique hypervariable region sequence. RefHVR_IDs are prefixed with the hypervariable region it includes (e.g., v6_AF153). The suffix of the ID is simply a unique alphanumeric that contains no additional information. If multiple RefSSU entries have the same hypervariable region they will have the same RefHVR_ID for that region. The database files contain the necessary information to determine the RefSSU source(s) for each RefHVR_ID.
Fasta files include the reference ID, the taxonomy assigned, and the source of the taxonomy. The RefHVR fastas include both the RefHVR_ID and the source RefSSU_ID. Only high-quality sequences are included in the fasta files.
RefSSU file:
RefHVR: SILVA sequences cut between the specified primers:
SSU rRNA Region | Primers | Unaligned Fasta | GAST Format | Note |
---|---|---|---|---|
v3 | 338F - 533R | refhvr_v3.fa.gz | refv3.tgz | |
v3v5 | 341F - 785Fa** | refhvr_v3v5.fa.gz | refv3v5.tgz | |
v4v6 | 565Fa** - 1064R | refhvr_v4v6.fa.gz | refv4v6.tgz | Assumes 3' - 5' sequencing. |
v4v6a | 685Fa** - 1048R | refhvr_v4v6a.fa.gz | Assumes 3' - 5' sequencing. Primer locations optimized for Archaea. |
|
v6 | 967F - 1064R | refhvr_v6.fa.gz | refv6.tgz | |
v6a | 958F 1048R | refhvr_v6a.fa.gz | refv6a.tgz | Primer locations optimized for Archaea. |
v9 | 1380F - 1510R | refhvr_v9.fa.gz | refv9.tgz | Primer locations optimized for Eukarya. |
refssu | refssu.tgz | Full length database. |
Template V6 Sequences: Clone43, E. coli, and S. epidermidis.