diff --git a/README.md b/README.md index f229a8cc29d6634bc8d4d78fcef04ee05cb53bbc..ad358385e657cb6621f5f78d91a7a077f8b8a4db 100644 --- a/README.md +++ b/README.md @@ -9,15 +9,15 @@ This was designed to process RADseq data from [RESERVEBENEFIT](https://www.biodi 1. [Introduction](#1-introduction) 2. [Installation](#2-installation) - 1. [Prerequisite](#21-prerequisite) - 2. [Data Files](#22-data-files) - 3. [Set up](#23-set-up) + 1. [Prerequisite](#21-prerequisite) + 2. [Data Files](#22-data-files) + 3. [Set up](#23-set-up) 3. [Reporting bugs](#3-reporting-bugs) 4. [Running the pipeline](#5-running-the-pipeline) - 1. [Initialisation](#41-initialisation) - 2. [Configuration](#42-configuration) - 3. [Run the pipeline into a single command](#43-run-the-pipeline-into-a-single-command) - 4. [Run the pipeline step by step](#44-run-the-pipeline-step-by-step) + 1. [Initialisation](#41-initialisation) + 2. [Configuration](#42-configuration) + 3. [Run the pipeline into a single command](#43-run-the-pipeline-into-a-single-command) + 4. [Run the pipeline step by step](#44-run-the-pipeline-step-by-step) # 1. Introduction @@ -40,7 +40,7 @@ You must install the following softwares and packages : 5.3.0 ``` -- [STACKS 2.0b](http://catchenlab.life.illinois.edu/stacks/) +- [STACKS 2.2](http://catchenlab.life.illinois.edu/stacks/) * Check version and if programs are correctly installed by typing : ``` @@ -49,7 +49,7 @@ You must install the following softwares and packages : gstacks --version populations --version ## should give you the output - 2.0b + 2.2 ``` - [BWA 0.7.17](https://icb.med.cornell.edu/wiki/index.php/Elementolab/BWA_tutorial) @@ -99,11 +99,15 @@ You must install the following softwares and packages : ## 2.2 Data Files The included data files are : +let's define some wildcards `*` +- `{run}` : any runs +- `{pool}` : any pools into a run +- `{species}` : any species * [config.yaml](01-info_files/config.yaml) : -* [barcodes.txt](01-info_files/barcodes.txt) : -* [infos.csv](01-info_files) : -* [populations_map.txt](01-info_files) : +* [barcodes.txt](01-info_files/barcodes.txt) : file containing barcodes used for {pool} into {run} +* [{species}_infos.csv](01-info_files) : information `.csv` table related to {species} each row is a sample and they are 4 columns which are run,pool,barcode,ID +* [{species}_populations_map.txt](01-info_files) : information table `.tsv` related to {species}. Each row is a sample and they are 2 columns which are ID,population. This file can be generated by the pipeline (see [Configuration](#42-configuration) section). However we strongly recommand you to do it manually. ## 2.3 Set Up @@ -114,7 +118,109 @@ cd snakemake_stacks2 ``` You will see the following folders : - +* [00-scripts](00-scripts): contains all the required scripts to run the whole pipeline +* [01-info_files](01-info_files) : contains all the required data files (see [Data Files](#22-data-files) section below) +* [02-raw](02-raw) : must contain your data from paired-end illumina sequencing runs. The data must be stored this way : + ``` + 02-raw/ + runA/ + poolA1/ + {poolA1}_R1_001.fastq.gz + {poolA1}_R2_001.fastq.gz + poolA2/ + {poolA2}_R1_001.fastq.gz + {poolA2}_R2_001.fastq.gz + ... + runB/ + poolB1/ + {poolB1}_R1_001.fastq.gz + {poolB1}_R2_001.fastq.gz + ... + ... + ``` +* [03-samples](03-samples): will store the results generated by demultiplexing with [process_radtags](http://catchenlab.life.illinois.edu/stacks/comp/process_radtags.php) and clone filtering [clone_filter](http://catchenlab.life.illinois.edu/stacks/comp/clone_filter.php). The data must be stored this way : + ``` + 02-raw/ + runA/ + poolA1/ + sample_{barcode1}.1.fq.gz + sample_{barcode1}.2.fq.gz + sample_{barcode2}.1.fq.gz + sample_{barcode2}.2.fq.gz + sample_{barcode3}.1.fq.gz + sample_{barcode3}.2.fq.gz + ... + poolA1_clone_filtered/ + sample_{barcode1}.1.1.fq.gz + sample_{barcode1}.2.2.fq.gz + sample_{barcode2}.1.1.fq.gz + sample_{barcode2}.2.2.fq.gz + sample_{barcode3}.1.1.fq.gz + sample_{barcode3}.2.2.fq.gz + ... + poolA2/ + sample_{barcode1}.1.fq.gz + sample_{barcode1}.2.fq.gz + ... + poolA2_clone_filtered/ + sample_{barcode1}.1.1.fq.gz + sample_{barcode1}.2.2.fq.gz + ... + ... + runB/ + poolB1/ + sample_{barcode1}.1.fq.gz + sample_{barcode1}.2.fq.gz + ... + poolB1_clone_filtered/ + sample_{barcode1}.1.1.fq.gz + sample_{barcode1}.2.2.fq.gz + ... + ... + ... + ``` +* [04-all_samples](04-all_samples): paired end `fastq.gz` files are named according to [{species}_infos.csv](01-info_files) information. Then reads are aligned onto reference genome sequences stored into [08-genomes](08-genomes). This folder contains "named" fatsq files and corresponding alignments `.bam` files. `.sorted.bam` are SORTED alignment files and `.sorted.bam.bai` are corresponding index. The data must be stored this way : + ``` + 02-raw/ + speciesA/ + {sampleA1}.1.fq.gz + {sampleA1}.2.fq.gz + {sampleA1}.bam + {sampleA1}.sorted.bam + {sampleA1}.sorted.bam.bai + {sampleA2}.1.fq.gz + {sampleA2}.2.fq.gz + {sampleA2}.bam + {sampleA2}.sorted.bam + {sampleA2}.sorted.bam.bai + ... + speciesB/ + {sampleB1}.1.fq.gz + {sampleB1}.2.fq.gz + {sampleB1}.bam + {sampleB1}.sorted.bam + {sampleB1}.sorted.bam.bai + ... + ... + ``` +* [05-stacks](05-stacks) : outputs from [gstacks](http://catchenlab.life.illinois.edu/stacks/comp/gstacks.php) +* [06-populations](06-populations) : outputs from [populations](http://catchenlab.life.illinois.edu/stacks/comp/populations.php) +* [08-genomes](08-genomes) : reference genome of each any species {species} used for the analysis. `.fasta` file is mandatory and stores all the scaffolds sequences of {species} genome assembly. `.amb`, `.ann`, `.bwt`, `.pac`, `.sa` are index files required by [BWA 0.7.17](https://icb.med.cornell.edu/wiki/index.php/Elementolab/BWA_tutorial). They will be automatically generated if absent. The data must be stored this way : + ``` + 08-genomes/ + {species}_genome.amb + {species}_genome.ann + {species}_genome.bwt + {species}_genome.fasta + {species}_genome.pac + {species}_genome.sa + ``` +* [10-logs](10-logs) : log files generated by every command + - process_radtags + - clone_filter + - genome_alignment + - gstacks + - populations # 3. Reporting bugs @@ -122,7 +228,7 @@ If you're sure you've found a bug — e.g. if one of my programs crashes with an obscur error message, or if the resulting file is missing part of the original data, then by all means submit a bug report. -I use [GitLab's issue system](https://gitlab.com/reservebenefit/snakemake_stacks2/issues) +I use [GitLab's issue system](http://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2/issues) as my bug database. You can submit your bug reports there. Please be as verbose as possible — e.g. include the command line, etc