I have the pipeline with the following trimming and mapping parameter:
Raw paired-end mapping (without adapter trimming) gives this result
But after running it with Trim_Galore I get this:
So the mapping rates drop radically from 57% to 0.18%.
I find it strange because the trim galore only use trim away very little sequence.
Why was it and what's the right parameter should I use for both Bowtie and Cutadapt/trim_galore?
Code:
trim_galore --stringency 5 --dont_gzip --trim1 --length 30 -q 0 -a CTGTCTCTTATACACATCT -o $TRIMGALORE_DIR $READ1 $READ2 bowtie $Bowtie_Index_File -1 $TRIMGALORE_R1 -2 $TRIMGALORE_R2 --threads 5 -m 1 -v 2 -S -I 0 -X 2000
Code:
********************************************** Stats for BAM file(s): ********************************************** Total reads: 169555726 Mapped reads: 97870984 (57.722%) Forward strand: 120620234 (71.139%) Reverse strand: 48935492 (28.861%) Failed QC: 0 (0%) Duplicates: 0 (0%) Paired-end reads: 169555726 (100%) 'Proper-pairs': 97870984 (57.722%) Both pairs mapped: 97870984 (57.722%) Read 1: 84777863 Read 2: 84777863 Singletons: 0 (0%) Average insert size (absolute value): 125.862 Median insert size (absolute value): 99
Code:
********************************************** Stats for BAM file(s): ********************************************** Total reads: 168972000 Mapped reads: 314228 (0.185965%) Forward strand: 168814886 (99.907%) Reverse strand: 157114 (0.0929823%) Failed QC: 0 (0%) Duplicates: 0 (0%) Paired-end reads: 168972000 (100%) 'Proper-pairs': 314228 (0.185965%) Both pairs mapped: 314228 (0.185965%) Read 1: 84486000 Read 2: 84486000 Singletons: 0 (0%) Average insert size (absolute value): 571.141 Median insert size (absolute value): 297
I find it strange because the trim galore only use trim away very little sequence.
Why was it and what's the right parameter should I use for both Bowtie and Cutadapt/trim_galore?
Code:
SUMMARISING RUN PARAMETERS ========================== Input filename: /home/ubuntu/R1_001.fastq Trimming mode: single-end Trim Galore version: 0.4.1 Cutadapt version: 1.12 Quality Phred score cutoff: 0 Quality encoding type selected: ASCII+33 Adapter sequence: 'CTGTCTCTTATACACATCT' () Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 5 bp Minimum required sequence length before a sequence gets removed: 30 bp All sequences will be trimmed by 1 bp on their 3' end to avoid problems with invalid paired-end alignments with Bowtie 1 Writing final adapter and quality trimmed output to BB_89_S1_R1_001_trimmed.fq >>> Now performing quality (cutoff 0) and adapter trimming in a single pass for the adapter sequence: 'CTGTCTCTTATACACATCT' from file /home/ubuntu/R1_001.fastq <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed 40000000 sequences processed 50000000 sequences processed 60000000 sequences processed 70000000 sequences processed 80000000 sequences processed This is cutadapt 1.12 with Python 2.7.13 Command line parameters: -f fastq -e 0.1 -q 0 -O 5 -a CTGTCTCTTATACACATCT /home/ubuntu/R1_001.fastq Trimming 1 adapter with at most 10.0% errors in single-end mode ... Finished in 804.75 s (9 us/read; 6.32 M reads/minute). === Summary === Total reads processed: 84,777,863 Reads with adapters: 424,374 (0.5%) Reads written (passing filters): 84,777,863 (100.0%) Total basepairs processed: 3,052,003,068 bp Quality-trimmed: 0 bp (0.0%) Total written (filtered): 3,048,935,811 bp (99.9%) === Adapter 1 === Sequence: CTGTCTCTTATACACATCT; Type: regular 3'; Length: 19; Trimmed: 424374 times. No. of allowed errors: 0-9 bp: 0; 10-19 bp: 1 Bases preceding removed adapters: A: 13.7% C: 36.7% G: 25.3% T: 24.2% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 5 132511 82790.9 0 132511 6 72231 20697.7 0 72231 7 50748 5174.4 0 50748 8 43106 1293.6 0 43106 9 53661 323.4 0 50630 3031 10 50660 80.9 1 44674 5986 11 8616 20.2 1 6320 2296 12 2894 5.1 1 1601 1293 13 2928 1.3 1 2706 222 14 995 0.3 1 906 89 15 2681 0.1 1 2570 111 16 620 0.0 1 598 22 17 1622 0.0 1 1578 44 18 263 0.0 1 255 8 19 460 0.0 1 446 14 20 86 0.0 1 84 2 21 107 0.0 1 103 4 22 49 0.0 1 42 7 23 28 0.0 1 23 5 24 11 0.0 1 10 1 25 8 0.0 1 7 1 26 7 0.0 1 6 1 27 23 0.0 1 22 1 28 8 0.0 1 8 29 1 0.0 1 1 33 2 0.0 1 1 1 35 1 0.0 1 0 1 36 47 0.0 1 45 2 RUN STATISTICS FOR INPUT FILE: /home/ubuntu/R1_001.fastq ============================================= 84777863 sequences processed in total Sequences removed because they became shorter than the length cutoff of 30 bp: 291863 (0.3%) Writing report to 'trim_galore_dir/R2_001.fastq_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /home/ubuntu/R2_001.fastq Trimming mode: single-end Trim Galore version: 0.4.1 Cutadapt version: 1.12 Quality Phred score cutoff: 0 Quality encoding type selected: ASCII+33 Adapter sequence: 'CTGTCTCTTATACACATCT' () Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 5 bp Minimum required sequence length before a sequence gets removed: 30 bp All sequences will be trimmed by 1 bp on their 3' end to avoid problems with invalid paired-end alignments with Bowtie 1 Writing final adapter and quality trimmed output to BB_89_S1_R2_001_trimmed.fq >>> Now performing quality (cutoff 0) and adapter trimming in a single pass for the adapter sequence: 'CTGTCTCTTATACACATCT' from file /home/ubuntu/R2_001.fastq <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed 40000000 sequences processed 50000000 sequences processed 60000000 sequences processed 70000000 sequences processed 80000000 sequences processed This is cutadapt 1.12 with Python 2.7.13 Command line parameters: -f fastq -e 0.1 -q 0 -O 5 -a CTGTCTCTTATACACATCT /home/ubuntu/R2_001.fastq Trimming 1 adapter with at most 10.0% errors in single-end mode ... Finished in 814.43 s (10 us/read; 6.25 M reads/minute). === Summary === Total reads processed: 84,777,863 Reads with adapters: 423,690 (0.5%) Reads written (passing filters): 84,777,863 (100.0%) Total basepairs processed: 3,052,003,068 bp Quality-trimmed: 0 bp (0.0%) Total written (filtered): 3,048,945,359 bp (99.9%) === Adapter 1 === Sequence: CTGTCTCTTATACACATCT; Type: regular 3'; Length: 19; Trimmed: 423690 times. No. of allowed errors: 0-9 bp: 0; 10-19 bp: 1 Bases preceding removed adapters: A: 13.7% C: 36.1% G: 25.4% T: 24.8% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 5 132783 82790.9 0 132783 6 72585 20697.7 0 72585 7 50354 5174.4 0 50354 8 43191 1293.6 0 43191 9 53096 323.4 0 49979 3117 10 50491 80.9 1 44467 6024 11 8598 20.2 1 6158 2440 12 2915 5.1 1 1566 1349 13 2880 1.3 1 2625 255 14 959 0.3 1 865 94 15 2597 0.1 1 2493 104 16 601 0.0 1 570 31 17 1573 0.0 1 1520 53 18 256 0.0 1 240 16 19 447 0.0 1 430 17 20 83 0.0 1 77 6 21 101 0.0 1 87 14 22 46 0.0 1 40 6 23 28 0.0 1 26 2 24 12 0.0 1 10 2 25 8 0.0 1 7 1 26 7 0.0 1 6 1 27 23 0.0 1 20 3 28 8 0.0 1 7 1 29 1 0.0 1 1 33 2 0.0 1 1 1 36 45 0.0 1 44 1
Comment