Hello Everyone,
I used BWA to align SOLiD mate pair reads (60,60) with parameters -n 8(total mismatch) -l 25 (seed) and -k 2 (mismatch in seed). I am getting a good mapping rate of around 65%. I will be using BFAST and Lifescope in future.
BWA outputs all the reads disregard of whether they were mapped, unmapped, mapped in pairs and other bitwise flags. To solve this problem I converted my SAM file to BAM file. As I am not interested in inversions or some unusual variants right now, I filtered out the SAM file so that it can be used for high confidence SNP and Indel calling. So I used:
samtools view -b -f 67 -f 31 -f 179 -f 115 old.bam > new.bam
67 and 31 (paired, mapped and properly paired) 179 and 115 (paired, mapped, properly mapped and both mapped reverse complimentary same strand)
Once I got the new.bam BAM, I sorted it and removed the duplicates usign samtools and then used mpileup to call for the SNPs and indels.
Below are my Yes or No questions:
1) This is my first time doing a NGS analysis. Am I doing things correctly? Is the order of steps I am performing correct?
2) As I only want to use high confidant reads I have filtered out all the unmapped, not properly paired reads. Do you think the flagwise bits I have used are correct.
Though I tried to remove the duplicates using samtools for my mate pair bam data but I can still see lot of mate-pair reads mapped to the same position as other mate-pair reads. Some people have suggested using Picard.
I think the problem could be because I used "trim 3'end" option in BWA. The reads that were duplicates before may not remain duplicates afterwards because the length of some reads got changed after trimming. Can anyone tell me how to resolve this issue.
PS: In future I am planning to try GATK but for now i think samtools variant calling can give me some idea about quality of mapping and my solid data.
Thanks.
Thanks
I used BWA to align SOLiD mate pair reads (60,60) with parameters -n 8(total mismatch) -l 25 (seed) and -k 2 (mismatch in seed). I am getting a good mapping rate of around 65%. I will be using BFAST and Lifescope in future.
BWA outputs all the reads disregard of whether they were mapped, unmapped, mapped in pairs and other bitwise flags. To solve this problem I converted my SAM file to BAM file. As I am not interested in inversions or some unusual variants right now, I filtered out the SAM file so that it can be used for high confidence SNP and Indel calling. So I used:
samtools view -b -f 67 -f 31 -f 179 -f 115 old.bam > new.bam
67 and 31 (paired, mapped and properly paired) 179 and 115 (paired, mapped, properly mapped and both mapped reverse complimentary same strand)
Once I got the new.bam BAM, I sorted it and removed the duplicates usign samtools and then used mpileup to call for the SNPs and indels.
Below are my Yes or No questions:
1) This is my first time doing a NGS analysis. Am I doing things correctly? Is the order of steps I am performing correct?
2) As I only want to use high confidant reads I have filtered out all the unmapped, not properly paired reads. Do you think the flagwise bits I have used are correct.
Though I tried to remove the duplicates using samtools for my mate pair bam data but I can still see lot of mate-pair reads mapped to the same position as other mate-pair reads. Some people have suggested using Picard.
I think the problem could be because I used "trim 3'end" option in BWA. The reads that were duplicates before may not remain duplicates afterwards because the length of some reads got changed after trimming. Can anyone tell me how to resolve this issue.
PS: In future I am planning to try GATK but for now i think samtools variant calling can give me some idea about quality of mapping and my solid data.
Thanks.
Thanks
Comment