@@ -99,11 +99,15 @@ You must install the following softwares and packages :
## 2.2 Data Files
The included data files are :
let's define some wildcards `*`
-`{run}` : any runs
-`{pool}` : any pools into a run
-`{species}` : any species
*[config.yaml](01-info_files/config.yaml) :
*[barcodes.txt](01-info_files/barcodes.txt) :
*[infos.csv](01-info_files) :
*[populations_map.txt](01-info_files) :
*[barcodes.txt](01-info_files/barcodes.txt) : file containing barcodes used for {pool} into {run}
*[{species}_infos.csv](01-info_files) : information `.csv` table related to {species} each row is a sample and they are 4 columns which are run,pool,barcode,ID
*[{species}_populations_map.txt](01-info_files) : information table `.tsv` related to {species}. Each row is a sample and they are 2 columns which are ID,population. This file can be generated by the pipeline (see [Configuration](#42-configuration) section). However we strongly recommand you to do it manually.
## 2.3 Set Up
...
...
@@ -114,7 +118,109 @@ cd snakemake_stacks2
```
You will see the following folders :
*[00-scripts](00-scripts): contains all the required scripts to run the whole pipeline
*[01-info_files](01-info_files) : contains all the required data files (see [Data Files](#22-data-files) section below)
*[02-raw](02-raw) : must contain your data from paired-end illumina sequencing runs. The data must be stored this way :
```
02-raw/
runA/
poolA1/
{poolA1}_R1_001.fastq.gz
{poolA1}_R2_001.fastq.gz
poolA2/
{poolA2}_R1_001.fastq.gz
{poolA2}_R2_001.fastq.gz
...
runB/
poolB1/
{poolB1}_R1_001.fastq.gz
{poolB1}_R2_001.fastq.gz
...
...
```
*[03-samples](03-samples): will store the results generated by demultiplexing with [process_radtags](http://catchenlab.life.illinois.edu/stacks/comp/process_radtags.php) and clone filtering [clone_filter](http://catchenlab.life.illinois.edu/stacks/comp/clone_filter.php). The data must be stored this way :
```
02-raw/
runA/
poolA1/
sample_{barcode1}.1.fq.gz
sample_{barcode1}.2.fq.gz
sample_{barcode2}.1.fq.gz
sample_{barcode2}.2.fq.gz
sample_{barcode3}.1.fq.gz
sample_{barcode3}.2.fq.gz
...
poolA1_clone_filtered/
sample_{barcode1}.1.1.fq.gz
sample_{barcode1}.2.2.fq.gz
sample_{barcode2}.1.1.fq.gz
sample_{barcode2}.2.2.fq.gz
sample_{barcode3}.1.1.fq.gz
sample_{barcode3}.2.2.fq.gz
...
poolA2/
sample_{barcode1}.1.fq.gz
sample_{barcode1}.2.fq.gz
...
poolA2_clone_filtered/
sample_{barcode1}.1.1.fq.gz
sample_{barcode1}.2.2.fq.gz
...
...
runB/
poolB1/
sample_{barcode1}.1.fq.gz
sample_{barcode1}.2.fq.gz
...
poolB1_clone_filtered/
sample_{barcode1}.1.1.fq.gz
sample_{barcode1}.2.2.fq.gz
...
...
...
```
*[04-all_samples](04-all_samples): paired end `fastq.gz` files are named according to [{species}_infos.csv](01-info_files) information. Then reads are aligned onto reference genome sequences stored into [08-genomes](08-genomes). This folder contains "named" fatsq files and corresponding alignments `.bam` files. `.sorted.bam` are SORTED alignment files and `.sorted.bam.bai` are corresponding index. The data must be stored this way :
```
02-raw/
speciesA/
{sampleA1}.1.fq.gz
{sampleA1}.2.fq.gz
{sampleA1}.bam
{sampleA1}.sorted.bam
{sampleA1}.sorted.bam.bai
{sampleA2}.1.fq.gz
{sampleA2}.2.fq.gz
{sampleA2}.bam
{sampleA2}.sorted.bam
{sampleA2}.sorted.bam.bai
...
speciesB/
{sampleB1}.1.fq.gz
{sampleB1}.2.fq.gz
{sampleB1}.bam
{sampleB1}.sorted.bam
{sampleB1}.sorted.bam.bai
...
...
```
*[05-stacks](05-stacks) : outputs from [gstacks](http://catchenlab.life.illinois.edu/stacks/comp/gstacks.php)
*[06-populations](06-populations) : outputs from [populations](http://catchenlab.life.illinois.edu/stacks/comp/populations.php)
*[08-genomes](08-genomes) : reference genome of each any species {species} used for the analysis. `.fasta` file is mandatory and stores all the scaffolds sequences of {species} genome assembly. `.amb`, `.ann`, `.bwt`, `.pac`, `.sa` are index files required by [BWA 0.7.17](https://icb.med.cornell.edu/wiki/index.php/Elementolab/BWA_tutorial). They will be automatically generated if absent. The data must be stored this way :
```
08-genomes/
{species}_genome.amb
{species}_genome.ann
{species}_genome.bwt
{species}_genome.fasta
{species}_genome.pac
{species}_genome.sa
```
*[10-logs](10-logs) : log files generated by every command
- process_radtags
- clone_filter
- genome_alignment
- gstacks
- populations
# 3. Reporting bugs
...
...
@@ -122,7 +228,7 @@ If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report.
I use [GitLab's issue system](https://gitlab.com/reservebenefit/snakemake_stacks2/issues)
I use [GitLab's issue system](http://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2/issues)
as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc