Skip to content
Snippets Groups Projects
Commit bd1ba812 authored by peguerin's avatar peguerin
Browse files

README installation update

parent 4021254b
Branches
No related tags found
No related merge requests found
......@@ -9,15 +9,15 @@ This was designed to process RADseq data from [RESERVEBENEFIT](https://www.biodi
1. [Introduction](#1-introduction)
2. [Installation](#2-installation)
1. [Prerequisite](#21-prerequisite)
2. [Data Files](#22-data-files)
3. [Set up](#23-set-up)
1. [Prerequisite](#21-prerequisite)
2. [Data Files](#22-data-files)
3. [Set up](#23-set-up)
3. [Reporting bugs](#3-reporting-bugs)
4. [Running the pipeline](#5-running-the-pipeline)
1. [Initialisation](#41-initialisation)
2. [Configuration](#42-configuration)
3. [Run the pipeline into a single command](#43-run-the-pipeline-into-a-single-command)
4. [Run the pipeline step by step](#44-run-the-pipeline-step-by-step)
1. [Initialisation](#41-initialisation)
2. [Configuration](#42-configuration)
3. [Run the pipeline into a single command](#43-run-the-pipeline-into-a-single-command)
4. [Run the pipeline step by step](#44-run-the-pipeline-step-by-step)
# 1. Introduction
......@@ -40,7 +40,7 @@ You must install the following softwares and packages :
5.3.0
```
- [STACKS 2.0b](http://catchenlab.life.illinois.edu/stacks/)
- [STACKS 2.2](http://catchenlab.life.illinois.edu/stacks/)
* Check version and if programs are correctly installed by typing :
```
......@@ -49,7 +49,7 @@ You must install the following softwares and packages :
gstacks --version
populations --version
## should give you the output
2.0b
2.2
```
- [BWA 0.7.17](https://icb.med.cornell.edu/wiki/index.php/Elementolab/BWA_tutorial)
......@@ -99,11 +99,15 @@ You must install the following softwares and packages :
## 2.2 Data Files
The included data files are :
let's define some wildcards `*`
- `{run}` : any runs
- `{pool}` : any pools into a run
- `{species}` : any species
* [config.yaml](01-info_files/config.yaml) :
* [barcodes.txt](01-info_files/barcodes.txt) :
* [infos.csv](01-info_files) :
* [populations_map.txt](01-info_files) :
* [barcodes.txt](01-info_files/barcodes.txt) : file containing barcodes used for {pool} into {run}
* [{species}_infos.csv](01-info_files) : information `.csv` table related to {species} each row is a sample and they are 4 columns which are run,pool,barcode,ID
* [{species}_populations_map.txt](01-info_files) : information table `.tsv` related to {species}. Each row is a sample and they are 2 columns which are ID,population. This file can be generated by the pipeline (see [Configuration](#42-configuration) section). However we strongly recommand you to do it manually.
## 2.3 Set Up
......@@ -114,7 +118,109 @@ cd snakemake_stacks2
```
You will see the following folders :
* [00-scripts](00-scripts): contains all the required scripts to run the whole pipeline
* [01-info_files](01-info_files) : contains all the required data files (see [Data Files](#22-data-files) section below)
* [02-raw](02-raw) : must contain your data from paired-end illumina sequencing runs. The data must be stored this way :
```
02-raw/
runA/
poolA1/
{poolA1}_R1_001.fastq.gz
{poolA1}_R2_001.fastq.gz
poolA2/
{poolA2}_R1_001.fastq.gz
{poolA2}_R2_001.fastq.gz
...
runB/
poolB1/
{poolB1}_R1_001.fastq.gz
{poolB1}_R2_001.fastq.gz
...
...
```
* [03-samples](03-samples): will store the results generated by demultiplexing with [process_radtags](http://catchenlab.life.illinois.edu/stacks/comp/process_radtags.php) and clone filtering [clone_filter](http://catchenlab.life.illinois.edu/stacks/comp/clone_filter.php). The data must be stored this way :
```
02-raw/
runA/
poolA1/
sample_{barcode1}.1.fq.gz
sample_{barcode1}.2.fq.gz
sample_{barcode2}.1.fq.gz
sample_{barcode2}.2.fq.gz
sample_{barcode3}.1.fq.gz
sample_{barcode3}.2.fq.gz
...
poolA1_clone_filtered/
sample_{barcode1}.1.1.fq.gz
sample_{barcode1}.2.2.fq.gz
sample_{barcode2}.1.1.fq.gz
sample_{barcode2}.2.2.fq.gz
sample_{barcode3}.1.1.fq.gz
sample_{barcode3}.2.2.fq.gz
...
poolA2/
sample_{barcode1}.1.fq.gz
sample_{barcode1}.2.fq.gz
...
poolA2_clone_filtered/
sample_{barcode1}.1.1.fq.gz
sample_{barcode1}.2.2.fq.gz
...
...
runB/
poolB1/
sample_{barcode1}.1.fq.gz
sample_{barcode1}.2.fq.gz
...
poolB1_clone_filtered/
sample_{barcode1}.1.1.fq.gz
sample_{barcode1}.2.2.fq.gz
...
...
...
```
* [04-all_samples](04-all_samples): paired end `fastq.gz` files are named according to [{species}_infos.csv](01-info_files) information. Then reads are aligned onto reference genome sequences stored into [08-genomes](08-genomes). This folder contains "named" fatsq files and corresponding alignments `.bam` files. `.sorted.bam` are SORTED alignment files and `.sorted.bam.bai` are corresponding index. The data must be stored this way :
```
02-raw/
speciesA/
{sampleA1}.1.fq.gz
{sampleA1}.2.fq.gz
{sampleA1}.bam
{sampleA1}.sorted.bam
{sampleA1}.sorted.bam.bai
{sampleA2}.1.fq.gz
{sampleA2}.2.fq.gz
{sampleA2}.bam
{sampleA2}.sorted.bam
{sampleA2}.sorted.bam.bai
...
speciesB/
{sampleB1}.1.fq.gz
{sampleB1}.2.fq.gz
{sampleB1}.bam
{sampleB1}.sorted.bam
{sampleB1}.sorted.bam.bai
...
...
```
* [05-stacks](05-stacks) : outputs from [gstacks](http://catchenlab.life.illinois.edu/stacks/comp/gstacks.php)
* [06-populations](06-populations) : outputs from [populations](http://catchenlab.life.illinois.edu/stacks/comp/populations.php)
* [08-genomes](08-genomes) : reference genome of each any species {species} used for the analysis. `.fasta` file is mandatory and stores all the scaffolds sequences of {species} genome assembly. `.amb`, `.ann`, `.bwt`, `.pac`, `.sa` are index files required by [BWA 0.7.17](https://icb.med.cornell.edu/wiki/index.php/Elementolab/BWA_tutorial). They will be automatically generated if absent. The data must be stored this way :
```
08-genomes/
{species}_genome.amb
{species}_genome.ann
{species}_genome.bwt
{species}_genome.fasta
{species}_genome.pac
{species}_genome.sa
```
* [10-logs](10-logs) : log files generated by every command
- process_radtags
- clone_filter
- genome_alignment
- gstacks
- populations
# 3. Reporting bugs
......@@ -122,7 +228,7 @@ If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report.
I use [GitLab's issue system](https://gitlab.com/reservebenefit/snakemake_stacks2/issues)
I use [GitLab's issue system](http://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2/issues)
as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment