VAMPS FAQ
Frequently Asked Questions
Raw sequence data is not available from the VAMPS website.
To request original raw data files from projects that are currently on vamps please note the following options:
In October of 2018, we switched from using the Invitrogen Platinum Taq DNA Polymerase High Fidelity (Cat. No. 11304102) to the Invitrogen Platinum SuperFi DNA Polymerase (Cat. #12351-050). High Fidelity polymerase offers 6X fidelity vs Taq, whereas SuperFi polymerase offers 100X fidelity vs Taq. Modified PCR master mix recipe is as follows:
Illumina Amplicon GenerationIn 2013 we switched to using Illumina platforms for 16S sequencing. Similar to the strategy for 454, we use fusion primers composed of the Illumina adaptors, multiplexing identifiers, and domain-specific primers. Thermocycling and reaction mixtures are different from 454 sequencing for Illumina amplicon PCR.
For Archaeal V6:
For Bacterial and Archaeal V4V5:
Bacterial V6 uses first the domain-specific primers for 25 cycles, then the products are cleaned and used in a second 5-cycle PCR with fusion primers.
For Bacterial V6:
After we produce amplicons, we can clean and/or size-select for the target products using Agencourt AMPure XP beads. Then we quantitate products with an Invitrogen Picogreen assay, pool at desired concentrations (e.g. equimolar), and quantitate the final pool with qPCR.
Conserved sequences that flank the hypervariable V6-V4 region of rRNAs serve as primer sites to generate PCR amplicons. Each PCR reaction produces products that can be informatically identified using a unique "key" incorporated between the 454 Life Sciences primer A or B and the 5' flanking rRNA primer. The use of a 5-bp key allows for the synthesis of as many as 81 oligonucleotides that differ by at least two sites. Our multiplexing strategy allows the concurrent collection of 10,000-50,000 tags from each of 8-40 samples in a single nine- hour sequencing run without use of partitioning gaskets that reduce the number of sequencing wells on the PicoTiterPlateTM. Amplicons can be pooled before the emPCR step and each pool is run on a large region of the plate.
454 Amplicon PCR (Christina Holmes/Ekaterina Andreishcheva) for four reactions:
*If template stock is dilute or otherwise resistant to amplification, more template can be added in place of water.
The 5 reactions are the three replicates of the environmental template, positive control, and negative control. Template (plasmid pool for positive control; water for negative control) is added as final step.
Program:
Supplies:
Our sequencing pipeline is public and the details are stored on GitHub. The links are here:
The Steps in order are: Demultiplexing, Merging, Uniqueing, Chimera Checking and Taxonomy Assignment
We use Meren's scripts to perform some of these steps. You can find them here on his GitHub site: https://github.com/merenlab/illumina-utils/tree/master/scripts
"P" value ................................................. 0.300000 Maximum number of mismatches in the overlapped region ...... None Minimum overlap size ....................................... 15 Minimum Q-score for mismatches ............................. 15 Q30 enforced? .............................................. True
GAST stands for Global Alignment for Sequence Taxonomy.
It uses a reference database of SSU sequences to determine the taxonomy of hypervariable region tags. The specifics are described in the citation below.
Huse SM, Dethlefsen L, Huber JA, Welch DM, Relman DA, Sogin ML (2008) Exploring Microbial Diversity and Taxonomy Using SSU rRNA Hypervariable Tag Sequencing. PLoS Genet 4(11): e1000255. https://doi.org/10.1371/journal.pgen.1000255
There is no raw sequence data available on VAMPS.
The primary way to export data from VAMPS is to use the 'Download Data' page which
is accessed from the main 'Sample Selection' page. After you've selected the datasets you want to include in your download
select the blue 'Download Data' button.
You can download data from VAMPS in various formats. Listed on the download page are the formats (with descriptions) available to you.
Description of Import Options
To upload data to VAMPS go here.Fasta files and count matrix files can be uploaded to VAMPS as long as they conform to the correct format which is described on the upload page for each type.
Multiple-Dataset fasta file
Metadata file (for projects already in VAMPS)
This is a csv (comma separated values) file but the data are actually separated by <TABS> to make it more human readable. These files must conform to the qiime mapping file format. Since these are processed (trimmed) data that are being uploaded, the 'BarcodeSequence' and 'LinkerPrimerSequence' fields can be left empty but the header names have to be present. The #SampleID field must be present and there can be no duplicate sample names. The Description field must also be present and must be the last field. There is a handy 'validate_mapping_file.py' script available in qiime to assist you with this file.
Huse, S.M., D. Mark Welch, H.G Morrison, and M.L. Sogin. (2010) Ironing out the wrinkles in the rare biosphere. Environmental Microbiology early view.
What is Phyloseq?
Phyloseq is an R library for microbiome data: (https://joey711.github.io/phyloseq/).
VAMPS uses R (https://www.r-project.org/) and Python (https://www.python.org/) scripts to produce some of the visualizations.
To use the Phyloseq library with your VAMPS data download the three Phyloseq files from the 'Display Choices' Page:
Import them directly into R (or R-script) as shown below to create a Phyloseq Object:
See the Phyloseq website for more help and examples.
What is ggplot2
ggplot2 is an R library that provides quality graphic displays using various big data formats such as VAMPS downloads.How are the reference databases created?
We create Ref16S, a reference database of aligned full-length sequences based on all available sequences in SILVA exported using the ARB software. New updates to both SILVA and RDP are incorporated as they become available.
Required metadata fields:
collection_date | Example: 2003-03-25More Information |
geo_loc_name | name of country or longhurst zoneMore Information |
dna_region | More Information |
domain | More Information |
env_biome; env_feature; env_material | Envo Ontology Browser Biome: Description of the site;
Feature: Description of feature in the biome where sample was obtained;More Information Material: Description of material |
env_package | More Information |
target_gene | Enter '16s' or '18s' for the section of the Small Subunit rRNAMore Information |
latitude | Geographical origin of sample in decimal degrees (WGS84 system), not DMS |
longitude | Geographical origin of sample in decimal degrees (WGS84 system), not DMS |
sequencing_platform | '454', 'illumina', 'ion-torrent', 'sanger' or 'unknown'More Information |
adapter_sequence | illumina specific |
illumina_index | illumina specific |
primer_suite | MBL Specific |
run | MBL Specific |
Project Name
No spaces allowed!Dataset Name
No spaces allowed!FASTA File
Metadata File
There is one metadata file format allowed for import (except taxbyseq --see below):TaxBySeq File and Metadata from Old (legacy) VAMPS
JSON Configuration File
SAMPLE: { "source":"VAMPS", "post_items": { "normalization":"maximum", "selected_distance":"morisita_horn", "tax_depth":"phylum", "domains":["Archaea","Bacteria","Eukarya","Organelle","Unknown"], "include_nas":"yes", "min_range":0, "max_range":100 }, "id_name_hash": { "ids":["49","50","51","52"] } }
Project
The project name refers to the overall study or research project to which the data belong. The project ties multiple samples and sequencing runs together.Dataset
The dataset name refers to a set of sequences within the project that are from one sampling location or individual at a particular date and time. The dataset combines sequences sampled or amplified together. Sequence and taxonomic data are uploaded on a dataset by dataset basis. Multiple datasets may be combined together or compared separately when using the Community Visualization tools.FASTA Files
When you upload a file it will be filtered for valid file format and data. If valid, the file will be uploaded into a temporary table of VAMPS data that will be available immediately for viewing.FASTA definition line (or defline)
The FASTA file defline follows NCBI FASTA format. Each read starts with a ‘>’ and the read ID is between the ‘>’ and the first ‘|’ (a ‘pipe’ symbol), it cannot contain any special characters other than dash ‘-’ or underscore ‘_’ and must be less than 32 characters. If there is any other information on the definition line, it must be after the first ‘|’. The whole definition line is separated from the sequence data by a return or linefeed.