Changes
Page history
Update 18072023_NextSeq_Results
authored
Dec 07, 2023
by
mbruno
Show whitespace changes
Inline
Side-by-side
18072023_NextSeq_Results.md
0 → 100644
View page @
b94b0215
**by: Morgane Bruno**
## eDNA (PoolA) - SEQ20230627-1
**18 Samples (3 barcodes)**
| Sample_Name | Inline barcode | FASTQ |
| --- | --- | --- |
| SPY
\_
201
\_
291 | CAACCG | ILLPCR2_10 |
| SPY
\_
201
\_
292 | CAACCG | ILLPCR2_11 |
| SPY
\_
201
\_
293 | CAACCG | ILLPCR2_12 |
| SPY
\_
201
\_
294 | CAACCG | ILLPCR2_13 |
| SPY
\_
201
\_
297 | CAACCG | ILLPCR2_17 |
| SPY
\_
201
\_
298 | CAACCG | ILLPCR2_18 |
| SPY
\_
211
\_
196 | CAACCG | ILLPCR2_19 |
| SPY
\_
211
\_
206 | CAACCG | ILLPCR2_27 |
| SPY
\_
211
\_
211 | CAACCG | ILLPCR2_28 |
| SPY
\_
211
\_
212 | GTATGA | ILLPCR2_10 |
| SPY
\_
201
\_
664 | GTATGA | ILLPCR2_11 |
| SPY
\_
211
\_
201 | GTATGA | ILLPCR2_12 |
| SPY
\_
211
\_
203 | GTATGA | ILLPCR2_13 |
| SPY
\_
211
\_
195 | GTATGA | ILLPCR2_15 |
| SPY
\_
211
\_
205 | GTATGA | ILLPCR2_18 |
| SPY
\_
211
\_
210 | GTATGA | ILLPCR2_19 |
| SPY
\_
211
\_
197 | GTATGA | ILLPCR2_27 |
| SPY
\_
211
\_
213 | TGGATT | ILLPCR2_10 |
### 0\. Raw `seqkit stats`
```
bash
seqkit stats
*
.gz
-T
| csvtk csv2md
-t
```
| file | format | type | num_seqs | sum_len | min_len | avg_len | max_len |
| --- | --- | --- | --- | --- | --- | --- | --- |
| ILLPCR2-10-A
\_
S1
\_
L001
\_
R1
\_
001.fastq.gz | FASTQ | DNA | 531152 | 80057086 | 87 | 150.7 | 151 |
| ILLPCR2-10-A
\_
S1
\_
L001
\_
R2
\_
001.fastq.gz | FASTQ | DNA | 531152 | 80069162 | 35 | 150.7 | 151 |
| ILLPCR2-11-A
\_
S3
\_
L001
\_
R1
\_
001.fastq.gz | FASTQ | DNA | 165123 | 24882193 | 144 | 150.7 | 151 |
| ILLPCR2-11-A
\_
S3
\_
L001
\_
R2
\_
001.fastq.gz | FASTQ | DNA | 165123 | 24898505 | 35 | 150.8 | 151 |
| ILLPCR2-12-A
\_
S5
\_
L001
\_
R1
\_
001.fastq.gz | FASTQ | DNA | 138610 | 20890479 | 69 | 150.7 | 151 |
| ILLPCR2-12-A
\_
S5
\_
L001
\_
R2
\_
001.fastq.gz | FASTQ | DNA | 138610 | 20897858 | 35 | 150.8 | 151 |
| ILLPCR2-13-A
\_
S6
\_
L001
\_
R1
\_
001.fastq.gz | FASTQ | DNA | 172617 | 26005621 | 74 | 150.7 | 151 |
| ILLPCR2-13-A
\_
S6
\_
L001
\_
R2
\_
001.fastq.gz | FASTQ | DNA | 172617 | 26017147 | 35 | 150.7 | 151 |
| ILLPCR2-15-A
\_
S7
\_
L001
\_
R1
\_
001.fastq.gz | FASTQ | DNA | 80075 | 12066608 | 144 | 150.7 | 151 |
| ILLPCR2-15-A
\_
S7
\_
L001
\_
R2
\_
001.fastq.gz | FASTQ | DNA | 80075 | 12070474 | 35 | 150.7 | 151 |
| ILLPCR2-17-A
\_
S8
\_
L001
\_
R1
\_
001.fastq.gz | FASTQ | DNA | 141852 | 21381701 | 146 | 150.7 | 151 |
| ILLPCR2-17-A
\_
S8
\_
L001
\_
R2
\_
001.fastq.gz | FASTQ | DNA | 141852 | 21391363 | 35 | 150.8 | 151 |
| ILLPCR2-18-A
\_
S9
\_
L001
\_
R1
\_
001.fastq.gz | FASTQ | DNA | 161048 | 24270981 | 144 | 150.7 | 151 |
| ILLPCR2-18-A
\_
S9
\_
L001
\_
R2
\_
001.fastq.gz | FASTQ | DNA | 161048 | 24285640 | 35 | 150.8 | 151 |
| ILLPCR2-19-A
\_
S10
\_
L001
\_
R1
\_
001.fastq.gz | FASTQ | DNA | 178263 | 26867667 | 145 | 150.7 | 151 |
| ILLPCR2-19-A
\_
S10
\_
L001
\_
R2
\_
001.fastq.gz | FASTQ | DNA | 178263 | 26872984 | 35 | 150.7 | 151 |
| ILLPCR2-27-A
\_
S2
\_
L001
\_
R1
\_
001.fastq.gz | FASTQ | DNA | 244168 | 36794600 | 140 | 150.7 | 151 |
| ILLPCR2-27-A
\_
S2
\_
L001
\_
R2
\_
001.fastq.gz | FASTQ | DNA | 244168 | 36808976 | 35 | 150.8 | 151 |
| ILLPCR2-28-A
\_
S4
\_
L001
\_
R1
\_
001.fastq.gz | FASTQ | DNA | 75323 | 11352757 | 70 | 150.7 | 151 |
| ILLPCR2-28-A
\_
S4
\_
L001
\_
R2
\_
001.fastq.gz | FASTQ | DNA | 75323 | 11352635 | 35 | 150.7 | 151 |
==
**Total number of reads: 3776462**
==
### 1\. Quality `fastqc` & `multiqc`
```
bash
fastqc
-o
01_quality/
-f
fastq ~/Documents/grenouille/nextseq/raw/PoolA/
*
multiqc 01_quality/
*
```

:x: Poor quality at the positions of the inline barcode :x:
### 2\. Demultiplexing `cutadapt`
Parameters:
-
`--error-rate 0.34`
-
`--minimum-length 30`
-
`--nextseq-trim 10`
-
`--no-indels`
```
bash
#!/bin/bash
barcode_file
=
$1
demultiplex_dir
=
$2
raw_dir
=
$3
all_lib
=
$(
awk
'{print $3}'
$1
)
libs_uniq
=
$(
printf
'%s\n'
"
${
all_lib
[@]
}
"
|
sort
-u
)
for
lib
in
$libs_uniq
do
barcode_fa
=
$demultiplex_dir
'/settings/'
$lib
'.fa'
raw_pref
=
$raw_dir
'/'
$lib
'-*_L001'
grep
$lib
$barcode_file
|
awk
-v
path
=
$demultiplex_dir
'{print ">"path"/"$1"\n^"$2}'
>
$barcode_fa
echo
'info :: Running cutadapt - Lib: '
$lib
cutadapt
--error-rate
0.34
--minimum-length
30
--nextseq-trim
10
--no-indels
\
-g
file:
$barcode_fa
--output
{
name
}
_R1.fastq.gz
--paired-output
{
name
}
_R2.fastq.gz
\
$raw_pref
'_R1_001.fastq.gz'
$raw_pref
'_R2_001.fastq.gz'
&>
$demultiplex_dir
'/logs/'
$lib
.log
done
```
|file |num_seqs|
|:----------------------------------------|:-------|
|02_demultiplexing/SPY_201_291_R1.fastq.gz|119349 |
|02_demultiplexing/SPY_201_292_R1.fastq.gz|49042 |
|02_demultiplexing/SPY_201_293_R1.fastq.gz|17042 |
|02_demultiplexing/SPY_201_294_R1.fastq.gz|63480 |
|02_demultiplexing/SPY_201_297_R1.fastq.gz|115623 |
|02_demultiplexing/SPY_201_298_R1.fastq.gz|59271 |
|02_demultiplexing/SPY_201_664_R1.fastq.gz|77942 |
|02_demultiplexing/SPY_211_195_R1.fastq.gz|65202 |
|02_demultiplexing/SPY_211_196_R1.fastq.gz|37140 |
|02_demultiplexing/SPY_211_197_R1.fastq.gz|69931 |
|02_demultiplexing/SPY_211_201_R1.fastq.gz|92351 |
|02_demultiplexing/SPY_211_203_R1.fastq.gz|71415 |
|02_demultiplexing/SPY_211_205_R1.fastq.gz|71158 |
|02_demultiplexing/SPY_211_206_R1.fastq.gz|120442 |
|02_demultiplexing/SPY_211_210_R1.fastq.gz|110879 |
|02_demultiplexing/SPY_211_211_R1.fastq.gz|57631 |
|02_demultiplexing/SPY_211_212_R1.fastq.gz|97993 |
|02_demultiplexing/SPY_211_213_R1.fastq.gz|186349 |
|||
|
**Total (R1+R2)**
|
**2964480**
|
|
**Mean**
|
**82346.67**
|
|
**Sd**
|
**38196.66**
|
==
**78.49 % of the reads have been associated to a sample**
==
### 3\. Remove Adapters & PolyG `cutadapt`
Parameters:
-
`--quality-cutoff 15`
-
`--minimum-length 30`
-
`--adapter AGATCGGAAGAGC`
-
`--cut 5`
(R2, Remove 5' Poly G / Poly C)
```
bash
#!/bin/bash
demultiplex_dir
=
$1
clean_dir
=
$2
for
R1
in
$demultiplex_dir
/
*
R1.fastq.gz
do
echo
$R1
output_name_1
=
$(
basename
$R1
|
sed
's/.gz$//'
)
cutadapt
--cores
2
--cut
-5
--quality-cutoff
15
--minimum-length
40
--adapter
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
--output
$clean_dir
/
$output_name_1
$R1
done
for
R2
in
$demultiplex_dir
/
*
R2.fastq.gz
do
echo
$R2
output_name_2
=
$(
basename
$R2
|
sed
's/.gz$//'
)
cutadapt
--cores
2
--cut
5
--quality-cutoff
15
--minimum-length
40
--adapter
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
--output
$clean_dir
/
$output_name_2
$R2
done
```
### 4\. Match up paired end fastq `fastq_pair`
Rewrite paired end fastq files to make sure that all reads have a mate and to separate out singletons.
```
bash
#!/bin/bash
clean_dir
=
$1
samples
=
$(
find
$clean_dir
-name
'*.fastq'
|
sed
's/_R[1,2].*//'
|
sort
-u
)
for
s
in
$samples
do
fastq_pair
$s
'_R1.fastq'
$s
'_R2.fastq'
done
```
|file |num_seqs|
|:-----------------------------------------|:-------|
|03_cleaning/SPY_201_291_R1.fastq.paired.fq|119123 |
|03_cleaning/SPY_201_292_R1.fastq.paired.fq|49034 |
|03_cleaning/SPY_201_293_R1.fastq.paired.fq|17037 |
|03_cleaning/SPY_201_294_R1.fastq.paired.fq|63475 |
|03_cleaning/SPY_201_297_R1.fastq.paired.fq|115614 |
|03_cleaning/SPY_201_298_R1.fastq.paired.fq|59254 |
|03_cleaning/SPY_201_664_R1.fastq.paired.fq|77577 |
|03_cleaning/SPY_211_195_R1.fastq.paired.fq|65130 |
|03_cleaning/SPY_211_196_R1.fastq.paired.fq|37129 |
|03_cleaning/SPY_211_197_R1.fastq.paired.fq|69911 |
|03_cleaning/SPY_211_201_R1.fastq.paired.fq|92325 |
|03_cleaning/SPY_211_203_R1.fastq.paired.fq|71394 |
|03_cleaning/SPY_211_205_R1.fastq.paired.fq|71155 |
|03_cleaning/SPY_211_206_R1.fastq.paired.fq|120424 |
|03_cleaning/SPY_211_210_R1.fastq.paired.fq|110812 |
|03_cleaning/SPY_211_211_R1.fastq.paired.fq|57622 |
|03_cleaning/SPY_211_212_R1.fastq.paired.fq|97979 |
|03_cleaning/SPY_211_213_R1.fastq.paired.fq|186244 |
### 5\. Mapping
#### 5\.1\. Reference genome `ncbi-acc-download` & `bwa index`
[
**NC_042226.1**
](
https://www.ncbi.nlm.nih.gov/nuccore/NC_042226.1/
)
```
bash
#!/bin/bash
ncbi-acc-download
--format
fasta NC_042226.1
--out
04_mapping/reference/NC_042226.1_Rana_temporia_mt.fasta
bwa index 04_mapping/reference/NC_042226.1_Rana_temporia_mt.fasta
```
#### 5\.2\. Mapping against reference `bwa mem` & `BBMAP pileup.sh`
```
bash
#!/bin/bash
clean_dir
=
$1
mapping_dir
=
$2
reference
=
$3
bwa_tool
=
$4
samples
=
$(
find
$clean_dir
-name
'*.fastq.paired.fq'
|
sed
's/_R[1,2].*//'
|
sort
-u
)
for
s
in
$samples
do
# BWA
pref_sample
=
$(
basename
$s
)
if
[
$bwa_tool
==
'mem'
]
then
bwa mem
-t
4
$reference
$s
'_R1.fastq.paired.fq'
$s
'_R2.fastq.paired.fq'
2>
$mapping_dir
'/logs/'
$pref_sample
'_on_ref.log'
>
$mapping_dir
'/'
$pref_sample
'_on_ref.sam'
elif
[
$bwa_tool
==
'aln'
]
then
bwa aln
-t
4
$reference
$s
'_R1.fastq.paired.fq'
-f
$mapping_dir
'/'
$pref_sample
'_aln_R1.sai'
2>
$mapping_dir
'/logs/'
$pref_sample
'_aln_R1.log'
bwa aln
-t
4
$reference
$s
'_R2.fastq.paired.fq'
-f
$mapping_dir
'/'
$pref_sample
'_aln_R2.sai'
2>
$mapping_dir
'/logs/'
$pref_sample
'_aln_R2.log'
bwa sampe
$reference
$mapping_dir
'/'
$pref_sample
'_aln_R1.sai'
$mapping_dir
'/'
$pref_sample
'_aln_R2.sai'
$s
'_R1.fastq.paired.fq'
$s
'_R2.fastq.paired.fq'
-f
$mapping_dir
'/'
$pref_sample
'_on_ref.sam'
2>
$mapping_dir
'/logs/'
$pref_sample
'_on_ref.log'
fi
grep
-v
"^@"
$mapping_dir
'/'
$pref_sample
'_on_ref.sam'
|
awk
'{print $3}'
|
sort
|
uniq
-c
>
$mapping_dir
'/'
$pref_sample
'_on_ref.sam_nb_reads'
# BBMAP PILEUP
pileup.sh
in
=
$mapping_dir
'/'
$pref_sample
'_on_ref.sam'
out
=
$mapping_dir
'/'
$pref_sample
'_on_ref.sam_covstats'
&>
$mapping_dir
'/'
$pref_sample
'_on_ref.sam_stats'
done
```
|file|n_reads|n_mapped_reads|percent_mapped|
|:----|:----|:----|:----|
|./04_mapping/SPY_201_298_on_ref.sam|121698|78337|64.370|
|./04_mapping/SPY_211_205_on_ref.sam|146134|85183|58.291|
|./04_mapping/SPY_211_212_on_ref.sam|201466|131669|65.355|
|./04_mapping/SPY_201_292_on_ref.sam|100392|59370|59.138|
|./04_mapping/SPY_211_211_on_ref.sam|121702|84072|69.080|
|./04_mapping/SPY_201_294_on_ref.sam|128789|64966|50.444|
|./04_mapping/SPY_211_213_on_ref.sam|380862|300641|78.937|
|./04_mapping/SPY_211_197_on_ref.sam|142214|56740|39.898|
|./04_mapping/SPY_201_293_on_ref.sam|34345|4231|12.319|
|./04_mapping/SPY_211_203_on_ref.sam|145503|59196|40.684|
|./04_mapping/SPY_211_201_on_ref.sam|187551|151559|80.809|
|./04_mapping/SPY_201_291_on_ref.sam|239395|52401|21.889|
|./04_mapping/SPY_211_195_on_ref.sam|132178|38577|29.186|
|./04_mapping/SPY_201_664_on_ref.sam|155269|3818|2.459|
|./04_mapping/SPY_211_196_on_ref.sam|75811|31430|41.458|
|./04_mapping/SPY_201_297_on_ref.sam|238112|171753|72.131|
|./04_mapping/SPY_211_206_on_ref.sam|247922|171782|69.289|
|./04_mapping/SPY_211_210_on_ref.sam|222713|23290|10.457|
### 6\. Metagenomics
### 6\.1\. Extract mapped and unmapped reads `samtools view` & `BBMAP reformat.sh`
```
bash
#!/bin/bash
mapping_dir
=
$1
for
s
in
$mapping_dir
/
*
_on_ref.sam
do
pref_sam
=
$(
basename
$s
)
samtools view
-f
4
$s
>
$mapping_dir
/
$pref_sam
'.unmapped.sam'
samtools view
-F
4
$s
>
$mapping_dir
/
$pref_sam
'.mapped.sam'
reformat.sh
in
=
$mapping_dir
/
$pref_sam
'.unmapped.sam'
out
=
$mapping_dir
/
$pref_sam
'.unmapped.fasta'
reformat.sh
in
=
$mapping_dir
/
$pref_sam
'.mapped.sam'
out
=
$mapping_dir
/
$pref_sam
'.mapped.fasta'
done
```
### 6\.2\. `blastn` & `krona`
```
bash
#!/bin/bash
mapping_dir
=
$1
blast_dir
=
$2
blast_db
=
$3
for
s
in
$mapping_dir
/
*
.fasta
do
pref
=
$(
basename
$s
)
blastn
-db
$blast_db
-query
$s
-out
$blast_dir
/
$pref
'.out'
-outfmt
6
ktImportBLAST
$blast_dir
/
$pref
'.out'
-o
$blast_dir
/
$pref
'.html'
done
```
## 7\. Variant Calling