Source codes for the paper : "Genomic resources for Mediterranean fishes"
Katharina Fietz, Pierre-Edouard Guerin, Elena Trofimenko, Véronique Arnal, Montserrat Torres-Oliva, Stéphane Lobréaux, Angel Pérez-Ruzafa, Stephanie Manel, Oscar Puebla
2017-2019
Submited to Genomics, 2020
Table of contents
1. Nuclear Genomes assembly
Nuclear genomes were assembled using the Platanus assembler. Platanus was selected due to its excellent performance with highly heterozygous genomes. The paired-end libraries were used to assemble reads into contigs, and both the paired-end and mate-pair libraries were used for scaffolding and gap closing.
Source codes
All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/genome_assemblies_collection
Clone repository
git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/genome_assemblies_collection.git
2. RAD-seq data processing
RAD-seq sequences were demultiplexed and filtered using the process_radtags pipeline in STACKS v2.2. Sequences were trimmed to a final length of 139bp due to a drop in read quality towards the end of the read. Taking advantage of paired-end information, clone_filter was used to remove pairs of paired-end reads that match exactly, as the vast majority of these are expected to be PCR clones. Paired-end read sequences were subsequently aligned with BWA to the reference genomes of M. surmuletus and D. sargus, and S. cabrilla, thereby improving the reliability of stacks building. Aligned reads were sorted using SAMTOOLS 1.9, and loci were built with gstacks providing genotype calls.
Source codes
All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2
Clone repository
git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2.git
3. SNPs statistics
In order to retain only high quality biallelic SNPs for population genetics, called genotypes were further filtered with the populations pipeline and vcftools v0.1.16. Only one randomly selected SNP was retained per locus, and a locus was retained only if present in at least 85% of individuals, and with a minimum minor allele frequency (MAF) of 1%. In order to reduce linkage among markers, only one locus was retained for all pairs of loci that were closer than 5000 bp or that had an r2 value >0.8. Finally, individuals with >30% missing data were also filtered out.
We calculated number of SNPs, distance between consecutive loci (in bp) and number of SNPs located on a coding region for each species
Source codes
All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/snps_statistics
Clone repository
git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/snps_statistics.git
4. Mitochondrial genomes assembly
Mitochondrial genomes were assembled and annotated using MitoZ. Five million sequences were randomly selected as a subset of the full paired-end sequence set. Mitochondrial sequences were then identified from this subset using a ranking method based on a Hidden Markov Model profile of known mitochondrial sequences from 2413 chordate species. Mitochondrial sequences were then used to assemble the mitochondrial genome. Finally, mitochondrial assemblies were annotated using BLAST family alignments on known protein coding genes, transfer RNA genes and rRNA genes.
Source codes
All source codes are available at https://gitlab.mbb.univ-montp2.fr/intrapop/assemble_mitogenome
Clone repository
git clone https://gitlab.mbb.univ-montp2.fr/intrapop/assemble_mitogenome.git