Apologies for cross-posting.
I am getting really low alignment rates while trying to do SNP analysis.
I sequenced the genome of a bacterial strain. Then conducted an experiment and then sequenced the genome of the strain at the end of the experiment to see what mutations (SNPs) took place during the course of the experiment.
Bowtie2 gives me the following output:
#map the reads
-bash-4.1$ ./bowtie2 -p 1 -x AER -1 S25_R1_001.fastq -2 S25_R2_001.fastq > S25_bowtie2.sam
#output
4240966 reads; of these:
4240966 (100.00%) were paired; of these:
4240777 (100.00%) aligned concordantly 0 times
161 (0.00%) aligned concordantly exactly 1 time
28 (0.00%) aligned concordantly >1 times
----
4240777 pairs aligned concordantly 0 times; of these:
10902 (0.26%) aligned discordantly 1 time
----
4229875 pairs aligned 0 times concordantly or discordantly; of these:
8459750 mates make up the pairs; of these:
8168790 (96.56%) aligned 0 times
49047 (0.58%) aligned exactly 1 time
241913 (2.86%) aligned >1 times
3.69% overall alignment rate
and bwa gives me the following output (I ran samtools flagstat command to see % overall alignment rate, which is 47% if I understand the output correctly)
#map the reads
-bash-4.1$ bwa mem -t 4 AER.fasta S25_R1_001.fastq S25_R2_001.fastq > S25_bwa.sam
#convert to bam
-bash-4.1$ ./samtools view -bS S25_bwa.sam > S25_bwa.bam
#get flagstats
-bash-4.1$ ./samtools flagstat S25_bwa.bam
#output
8816220 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
334288 + 0 supplementary
0 + 0 duplicates
4153225 + 0 mapped (47.11%:-nan%)
8481932 + 0 paired in sequencing
4240966 + 0 read1
4240966 + 0 read2
2605074 + 0 properly paired (30.71%:-nan%)
3513674 + 0 with itself and mate mapped
305263 + 0 singletons (3.60%:-nan%)
901042 + 0 with mate mapped to a different chr
41706 + 0 with mate mapped to a different chr (mapQ>=5)
I have read that bwa me is generally a very aggressive aligner and that probably explains the 47% rate.
I am looking into how to extract unmapped reads so that I can blast them to see any contamination issues.
The fastqc reports look fine (no red crosses), especially after I trim the first ~10 and last ~3 bases. Is there any other quality control I should be doing before mapping? What are all of the reasons for low alignment rates?
I am getting really low alignment rates while trying to do SNP analysis.
I sequenced the genome of a bacterial strain. Then conducted an experiment and then sequenced the genome of the strain at the end of the experiment to see what mutations (SNPs) took place during the course of the experiment.
Bowtie2 gives me the following output:
#map the reads
-bash-4.1$ ./bowtie2 -p 1 -x AER -1 S25_R1_001.fastq -2 S25_R2_001.fastq > S25_bowtie2.sam
#output
4240966 reads; of these:
4240966 (100.00%) were paired; of these:
4240777 (100.00%) aligned concordantly 0 times
161 (0.00%) aligned concordantly exactly 1 time
28 (0.00%) aligned concordantly >1 times
----
4240777 pairs aligned concordantly 0 times; of these:
10902 (0.26%) aligned discordantly 1 time
----
4229875 pairs aligned 0 times concordantly or discordantly; of these:
8459750 mates make up the pairs; of these:
8168790 (96.56%) aligned 0 times
49047 (0.58%) aligned exactly 1 time
241913 (2.86%) aligned >1 times
3.69% overall alignment rate
and bwa gives me the following output (I ran samtools flagstat command to see % overall alignment rate, which is 47% if I understand the output correctly)
#map the reads
-bash-4.1$ bwa mem -t 4 AER.fasta S25_R1_001.fastq S25_R2_001.fastq > S25_bwa.sam
#convert to bam
-bash-4.1$ ./samtools view -bS S25_bwa.sam > S25_bwa.bam
#get flagstats
-bash-4.1$ ./samtools flagstat S25_bwa.bam
#output
8816220 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
334288 + 0 supplementary
0 + 0 duplicates
4153225 + 0 mapped (47.11%:-nan%)
8481932 + 0 paired in sequencing
4240966 + 0 read1
4240966 + 0 read2
2605074 + 0 properly paired (30.71%:-nan%)
3513674 + 0 with itself and mate mapped
305263 + 0 singletons (3.60%:-nan%)
901042 + 0 with mate mapped to a different chr
41706 + 0 with mate mapped to a different chr (mapQ>=5)
I have read that bwa me is generally a very aggressive aligner and that probably explains the 47% rate.
I am looking into how to extract unmapped reads so that I can blast them to see any contamination issues.
The fastqc reports look fine (no red crosses), especially after I trim the first ~10 and last ~3 bases. Is there any other quality control I should be doing before mapping? What are all of the reasons for low alignment rates?
Comment