primer_specificity_ecopcr
Bastien Macé, 2020
Table of contents
Introduction
This project presents an efficient way to check the specificity of an already designed pair of primers. In other words, it permits to ensure that the pair of primers selected will in theory only amplified sequences from the studied species. To do that, the ecoPCR program is used to realize an in silico PCR in a chosen database with the selected pair of primers.
Installation
-
First, you need to install the ecoPCR tool by following the instructions on this link.
-
Here, we will use the entire EMBL nucleotide database to check the primer specificity. To do that, we download the standard (std) dataclass from the EMBL database :
mkdir EMBL
cd EMBL
wget ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/release/std/*
gzip -d *
cd ..
- Downloading the NCBI taxonomy is also recommended to refer the sequences amplified in silico to their corresponding taxon :
mkdir TAXO
cd TAXO
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -zxvf taxdump.tar.gz
cd ..
After cloning this project, the following line realizes these two last downloads :
bash downloads.sh
- The OBITools may be needed to filter or annotate the sequences amplified. Documentation is available here to download this toolkit. Here, we use a container holding the OBITools thanks to Singularity software. We launch it with the following command :
bash singularity.sh
In silico PCR
After having downloaded the EMBL database and decompressed it, you will need to convert it into ecoPCR format. Using nohup is recommended, as the conversion take several hours.
nohup obiconvert --skip-on-error --embl -t ${BDR_PATH}/TAXO --ecopcrdb-output="${RD_prefix}" ${BDR_PATH}/EMBL/rel_std_*.dat &
Then, you can use the ecoPCR command to realize the in silico PCR :
ecoPCR -d "${RD_prefix}" -e "${ecoPCR_e}" -l "${ecoPCR_l}" -L "${ecoPCR_L}" "${primerF}" "${primerR}" > "${rd_prefix}".ecopcr
The obigrep command from the OBITools can then be used to only keep the amplified sequences which have a good taxonomic description at the species, genus and family levels :
obigrep -d "${RD_prefix}" --require-rank=species --require-rank=genus --require-rank=family "${rd_prefix}".ecopcr > "${rd_prefix}"_clean.fasta
The obiuniq command removes the redundant sequences :
obiuniq -d "${RD_prefix}" "${rd_prefix}"_clean.fasta > "${rd_prefix}"_clean_uniq.fasta
You can then use obigrep again to ensure that the dereplicated sequences have a taxid at the family level :
obigrep -d "${RD_prefix}" --require-rank=family "${rd_prefix}"_clean_uniq.fasta > "${rd_prefix}"_clean_uniq_clean.fasta
Ensure then that each sequences have a unique identification :
obiannotate --uniq-id "${rd_prefix}"_clean_uniq_clean.fasta > "${rd_prefix}".fasta
Now you have a .fasta file containing the sequences amplified by your pair of primers in the EMBL database, and their corresponding taxonomic ranks.
After cloning this project, the following line realizes these in silico PCR steps for all your pairs of primers, without having to convert the EMBL database everytime, after having corrected the config.sh files according to your pairs of primers :
bash main_script.sh