Skip to content
Snippets Groups Projects

Source codes for the paper : "Genomic resources for Mediterranean fishes"

Katharina Fietz, Pierre-Edouard Guerin, Elena Trofimenko, Véronique Arnal, Montserrat Torres-Oliva, Stéphane Lobréaux, Angel Pérez-Ruzafa, Stephanie Manel, Oscar Puebla

2017-2019

Submited to Genomics, 2020


Table of contents

  1. Nuclear Genomes assembly
  2. RAD-seq data processing
  3. SNPs statistics
  4. Mitochondrial genomes assembly

1. Nuclear Genomes assembly

Nuclear genomes were assembled using the Platanus assembler. Platanus was selected due to its excellent performance with highly heterozygous genomes. The paired-end libraries were used to assemble reads into contigs, and both the paired-end and mate-pair libraries were used for scaffolding and gap closing.

Source codes

All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/genome_assemblies_collection

Clone repository

git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/genome_assemblies_collection.git

2. RAD-seq data processing

RAD-seq sequences were demultiplexed and filtered using the process_radtags pipeline in STACKS v2.2. Sequences were trimmed to a final length of 139bp due to a drop in read quality towards the end of the read. Taking advantage of paired-end information, clone_filter was used to remove pairs of paired-end reads that match exactly, as the vast majority of these are expected to be PCR clones. Paired-end read sequences were subsequently aligned with BWA to the reference genomes of M. surmuletus and D. sargus, and S. cabrilla, thereby improving the reliability of stacks building. Aligned reads were sorted using SAMTOOLS 1.9, and loci were built with gstacks providing genotype calls.

Source codes

All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2

Clone repository

git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2.git

3. SNPs statistics

In order to retain only high quality biallelic SNPs for population genetics, called genotypes were further filtered with the populations pipeline and vcftools v0.1.16. Only one randomly selected SNP was retained per locus, and a locus was retained only if present in at least 85% of individuals, and with a minimum minor allele frequency (MAF) of 1%. In order to reduce linkage among markers, only one locus was retained for all pairs of loci that were closer than 5000 bp or that had an r2 value >0.8. Finally, individuals with >30% missing data were also filtered out.

We calculated number of SNPs, distance between consecutive loci (in bp) and number of SNPs located on a coding region for each species

Source codes

All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/snps_statistics

Clone repository

git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/snps_statistics.git

4. Mitochondrial genomes assembly

Mitochondrial genomes were assembled and annotated using MitoZ. Five million sequences were randomly selected as a subset of the full paired-end sequence set. Mitochondrial sequences were then identified from this subset using a ranking method based on a Hidden Markov Model profile of known mitochondrial sequences from 2413 chordate species. Mitochondrial sequences were then used to assemble the mitochondrial genome. Finally, mitochondrial assemblies were annotated using BLAST family alignments on known protein coding genes, transfer RNA genes and rRNA genes.

Source codes

All source codes are available at https://gitlab.mbb.univ-montp2.fr/intrapop/assemble_mitogenome

Clone repository

git clone https://gitlab.mbb.univ-montp2.fr/intrapop/assemble_mitogenome.git