Skip to content
Snippets Groups Projects
user avatar
craphael authored
3c666244
History

primer_specificity_ecopcr

Bastien Macé, 2020


Table of contents


Introduction

This project presents an efficient way to check the specificity of an already designed pair of primers. In other words, it permits to ensure that the pair of primers selected will in theory only amplified sequences from the studied species. To do that, the ecoPCR program is used to realize an in silico PCR in a chosen database with the selected pair of primers.

Installation

  • First, you need to install the ecoPCR tool by following the instructions on this link.

  • Here, we will use the entire EMBL nucleotide database to check the primer specificity. To do that, we download the standard (std) dataclass from the EMBL database :

mkdir EMBL
cd EMBL
wget ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/release/std/*
gzip -d *
cd ..
  • Downloading the NCBI taxonomy is also recommended to refer the sequences amplified in silico to their corresponding taxon :
mkdir TAXO
cd TAXO
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -zxvf taxdump.tar.gz
cd ..

After cloning this project, the following line realizes these two last downloads :

bash downloads.sh
  • The OBITools may be needed to filter or annotate the sequences amplified. Documentation is available here to download this toolkit. Here, we use a container holding the OBITools thanks to Singularity software. We launch it with the following command :
bash singularity.sh

In silico PCR

After having downloaded the EMBL database and decompressed it, you will need to convert it into ecoPCR format. Using nohup is recommended, as the conversion take several hours.

nohup obiconvert --skip-on-error --embl -t ${BDR_PATH}/TAXO --ecopcrdb-output="${RD_prefix}" ${BDR_PATH}/EMBL/rel_std_*.dat &

Then, you can use the ecoPCR command to realize the in silico PCR :

ecoPCR -d "${RD_prefix}" -e "${ecoPCR_e}" -l "${ecoPCR_l}" -L "${ecoPCR_L}" "${primerF}" "${primerR}" > "${rd_prefix}".ecopcr

The obigrep command from the OBITools can then be used to only keep the amplified sequences which have a good taxonomic description at the species, genus and family levels :

obigrep -d "${RD_prefix}" --require-rank=species --require-rank=genus --require-rank=family "${rd_prefix}".ecopcr > "${rd_prefix}"_clean.fasta

The obiuniq command removes the redundant sequences :

obiuniq -d "${RD_prefix}" "${rd_prefix}"_clean.fasta > "${rd_prefix}"_clean_uniq.fasta

You can then use obigrep again to ensure that the dereplicated sequences have a taxid at the family level :

obigrep -d "${RD_prefix}" --require-rank=family "${rd_prefix}"_clean_uniq.fasta > "${rd_prefix}"_clean_uniq_clean.fasta

Ensure then that each sequences have a unique identification :

obiannotate --uniq-id "${rd_prefix}"_clean_uniq_clean.fasta > "${rd_prefix}".fasta

Now you have a .fasta file containing the sequences amplified by your pair of primers in the EMBL database, and their corresponding taxonomic ranks.

After cloning this project, the following line realizes these in silico PCR steps for all your pairs of primers, without having to convert the EMBL database everytime, after having corrected the config.sh files according to your pairs of primers :

bash main_script.sh