Hi,
I have a PE dataset 300bp inserts by illumina MiSeq. I aligned the raw data using BWA-mem. Mapping statistics generated using Samtools flagstat are below.
5541008 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
76008 + 0 supplementary
0 + 0 duplicates
5413610 + 0 mapped (97.70% : N/A)
5465000 + 0 paired in sequencing
2732500 + 0 read1
2732500 + 0 read2
5266140 + 0 properly paired (96.36% : N/A)
5319406 + 0 with itself and mate mapped
18196 + 0 singletons (0.33% : N/A)
32368 + 0 with mate mapped to a different chr
8821 + 0 with mate mapped to a different chr (mapQ>=5)
I also used Trimmomatic on the same dataset, ILLUMINACLIP to remove any adapter sequences, trimmed reads sliding window 4:10, leading & trailing bases <3, length <39bp. Aligned this set using BWA-mem and got the results as below.
5529752 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
65642 + 0 supplementary
0 + 0 duplicates
5396698 + 0 mapped (97.59% : N/A)
5464110 + 0 paired in sequencing
2732055 + 0 read1
2732055 + 0 read2
5263982 + 0 properly paired (96.34% : N/A)
5308488 + 0 with itself and mate mapped
22568 + 0 singletons (0.41% : N/A)
23856 + 0 with mate mapped to a different chr
4865 + 0 with mate mapped to a different chr (mapQ>=5)
1) Can I use this information to select a best alignment based on mapped %. Raw data gave 97.7% mapping which is higher than trimmed data. So can I select BAM I got from raw data as the best?
2) I used "samtools view -c -f 3 data.bam" to find the properly paired reads. But the value I got is different to the value for that parameter by flagstat for both datasets. I checked some other parameters like itself & mate mapped they too gave different results. What could be the reason.
Appreciate your answers.
Thanks in advance.
Regds
Rangika
I have a PE dataset 300bp inserts by illumina MiSeq. I aligned the raw data using BWA-mem. Mapping statistics generated using Samtools flagstat are below.
5541008 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
76008 + 0 supplementary
0 + 0 duplicates
5413610 + 0 mapped (97.70% : N/A)
5465000 + 0 paired in sequencing
2732500 + 0 read1
2732500 + 0 read2
5266140 + 0 properly paired (96.36% : N/A)
5319406 + 0 with itself and mate mapped
18196 + 0 singletons (0.33% : N/A)
32368 + 0 with mate mapped to a different chr
8821 + 0 with mate mapped to a different chr (mapQ>=5)
I also used Trimmomatic on the same dataset, ILLUMINACLIP to remove any adapter sequences, trimmed reads sliding window 4:10, leading & trailing bases <3, length <39bp. Aligned this set using BWA-mem and got the results as below.
5529752 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
65642 + 0 supplementary
0 + 0 duplicates
5396698 + 0 mapped (97.59% : N/A)
5464110 + 0 paired in sequencing
2732055 + 0 read1
2732055 + 0 read2
5263982 + 0 properly paired (96.34% : N/A)
5308488 + 0 with itself and mate mapped
22568 + 0 singletons (0.41% : N/A)
23856 + 0 with mate mapped to a different chr
4865 + 0 with mate mapped to a different chr (mapQ>=5)
1) Can I use this information to select a best alignment based on mapped %. Raw data gave 97.7% mapping which is higher than trimmed data. So can I select BAM I got from raw data as the best?
2) I used "samtools view -c -f 3 data.bam" to find the properly paired reads. But the value I got is different to the value for that parameter by flagstat for both datasets. I checked some other parameters like itself & mate mapped they too gave different results. What could be the reason.
Appreciate your answers.
Thanks in advance.
Regds
Rangika
Comment