Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to fix bwa and ssaha2 misalignments?

    At the 98th and 99th codon of HLA-DQA1 gene in the reference genome, NA12878 should be ATCATG (reference) and AGTCTG respectively for the two chromomsomes. However, when I looked at the bams (downloaded from 1000g) in IGV, I noticed that both bwa and ssaha2 added an insertion and a deletion to the alignment:

    A-TCATG
    AGTC-TG

    How should I fix this? Increase the gap penalty and re-run the alignment? Is there a way to manually edit the alignment inside bam?
    Attached Files

  • #2
    Fixed this by manually edit the sam file. Anyone know if this can be fixed by adjusting the gap penalty?

    Comment


    • #3
      I tried GSNAP with SNP-tolerant on. However, I find that many reads trimmed this part for unknown reasons such that the coverage on the 98th and 99th codon in the alternate allele is very low.

      Comment


      • #4
        Ran bwa again with gap opening penalty increases from 11 to 15. But I still have those insertion-deletion alignments. What should I try next?

        Comment


        • #5
          You might try running a local realignment program such as SRMA.

          Comment


          • #6
            What's more likely, three SNPs in a row, or two 1bp indels and two matches?

            The inequality you want to satisfy 3*-MM < 2*M + (-O + -E).
            M = 1
            MM = -3
            O = -5
            E = -2

            I use these as the default parameters in TMAP.

            Comment


            • #7
              Originally posted by nilshomer View Post
              What's more likely, three SNPs in a row, or two 1bp indels and two matches?

              The inequality you want to satisfy 3*-MM < 2*M + (-O + -E).
              M = 1
              MM = -3
              O = -5
              E = -2

              I use these as the default parameters in TMAP.
              By default, bwa aln has M=1, MM=-3, O=-11, E=-4. It seems to me the inequality should be in favor of having three mismatches but in practice it is not. What's going on here?

              Comment


              • #8
                Originally posted by gaffa View Post
                You might try running a local realignment program such as SRMA.
                Is this an abandoned project? The binary is dated 2010-10-22

                Comment


                • #9
                  Originally posted by ymc View Post
                  Is this an abandoned project? The binary is dated 2010-10-22

                  Source on github seems to be newer, but still 1 year old:

                  Short-read Micro-Aligner. Contribute to nh13/SRMA development by creating an account on GitHub.

                  Comment


                  • #10
                    Originally posted by gaffa View Post
                    You might try running a local realignment program such as SRMA.
                    Oh well, I am getting hundreds of this arraylist error after 145min of srma run....

                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)

                    real 145m13.649s
                    user 177m44.719s
                    sys 3m20.097s

                    Comment


                    • #11
                      Originally posted by darked89 View Post
                      Source on github seems to be newer, but still 1 year old:

                      https://github.com/nh13/SRMA
                      Thanks. I am trying this slightly newer 0.1.16 version now

                      Comment


                      • #12
                        SRMA 0.1.16 also crashes

                        I tried "bwa aln -M 2" as well but the problem remains.

                        Comment


                        • #13
                          Originally posted by ymc View Post
                          SRMA 0.1.16 also crashes

                          I tried "bwa aln -M 2" as well but the problem remains.
                          Hard to tell why. Few ideas:

                          1) Just in case: you are using sorted and indexed BAM as SRMA input?

                          2) Try to get some small test BAM file which is known to run OK with SRMA and then check if your local SRMA works on them.

                          3) if 1 and 2 are OK, something else may be wrong with your BAM. You may resort it with newest picard / validate the BAM.

                          4) if all else fails, switch to GATK

                          Comment


                          • #14
                            I had the same problem. Calling the SNPs without BAQ in samtools mpileup fixed the problem (-B option). I have had the best results using the Extended BAQ, brings back the false negatives while reducing false positives as standard BAQ should.

                            Comment


                            • #15
                              Tried the latest GATK's indel realigner but it does seem to do anything to my problem

                              time java -jar GenomeAnalysisTK-2.0-31-gf57127e/GenomeAnalysisTK.jar -nt 6 -T RealignerTargetCreator -R ../exome/human_g1k_v37.fasta -o SRR098401_bwa.intervals -I ../NA12878/SRR098401_bwa.bam -known ../exome/Mills_and_1000G_gold_standard.indels.b37.vcf
                              time java -jar GenomeAnalysisTK-2.0-31-gf57127e/GenomeAnalysisTK.jar -T IndelRealigner -R ../exome/human_g1k_v37.fasta -I ../NA12878/SRR098401_bwa.bam -targetIntervals SRR098401_bwa.intervals -known ../exome/Mills_and_1000G_gold_standard.indels.b37.vcf -o SRR098401_realigned.bam

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              179 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-28-2025, 12:58 PM
                              0 responses
                              272 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-24-2025, 02:48 PM
                              0 responses
                              657 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-21-2025, 02:46 PM
                              0 responses
                              267 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X