Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to fix bwa and ssaha2 misalignments?

    At the 98th and 99th codon of HLA-DQA1 gene in the reference genome, NA12878 should be ATCATG (reference) and AGTCTG respectively for the two chromomsomes. However, when I looked at the bams (downloaded from 1000g) in IGV, I noticed that both bwa and ssaha2 added an insertion and a deletion to the alignment:

    A-TCATG
    AGTC-TG

    How should I fix this? Increase the gap penalty and re-run the alignment? Is there a way to manually edit the alignment inside bam?
    Attached Files

  • #2
    Fixed this by manually edit the sam file. Anyone know if this can be fixed by adjusting the gap penalty?

    Comment


    • #3
      I tried GSNAP with SNP-tolerant on. However, I find that many reads trimmed this part for unknown reasons such that the coverage on the 98th and 99th codon in the alternate allele is very low.

      Comment


      • #4
        Ran bwa again with gap opening penalty increases from 11 to 15. But I still have those insertion-deletion alignments. What should I try next?

        Comment


        • #5
          You might try running a local realignment program such as SRMA.

          Comment


          • #6
            What's more likely, three SNPs in a row, or two 1bp indels and two matches?

            The inequality you want to satisfy 3*-MM < 2*M + (-O + -E).
            M = 1
            MM = -3
            O = -5
            E = -2

            I use these as the default parameters in TMAP.

            Comment


            • #7
              Originally posted by nilshomer View Post
              What's more likely, three SNPs in a row, or two 1bp indels and two matches?

              The inequality you want to satisfy 3*-MM < 2*M + (-O + -E).
              M = 1
              MM = -3
              O = -5
              E = -2

              I use these as the default parameters in TMAP.
              By default, bwa aln has M=1, MM=-3, O=-11, E=-4. It seems to me the inequality should be in favor of having three mismatches but in practice it is not. What's going on here?

              Comment


              • #8
                Originally posted by gaffa View Post
                You might try running a local realignment program such as SRMA.
                Is this an abandoned project? The binary is dated 2010-10-22

                Comment


                • #9
                  Originally posted by ymc View Post
                  Is this an abandoned project? The binary is dated 2010-10-22

                  Source on github seems to be newer, but still 1 year old:

                  Short-read Micro-Aligner. Contribute to nh13/SRMA development by creating an account on GitHub.

                  Comment


                  • #10
                    Originally posted by gaffa View Post
                    You might try running a local realignment program such as SRMA.
                    Oh well, I am getting hundreds of this arraylist error after 145min of srma run....

                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)

                    real 145m13.649s
                    user 177m44.719s
                    sys 3m20.097s

                    Comment


                    • #11
                      Originally posted by darked89 View Post
                      Source on github seems to be newer, but still 1 year old:

                      https://github.com/nh13/SRMA
                      Thanks. I am trying this slightly newer 0.1.16 version now

                      Comment


                      • #12
                        SRMA 0.1.16 also crashes

                        I tried "bwa aln -M 2" as well but the problem remains.

                        Comment


                        • #13
                          Originally posted by ymc View Post
                          SRMA 0.1.16 also crashes

                          I tried "bwa aln -M 2" as well but the problem remains.
                          Hard to tell why. Few ideas:

                          1) Just in case: you are using sorted and indexed BAM as SRMA input?

                          2) Try to get some small test BAM file which is known to run OK with SRMA and then check if your local SRMA works on them.

                          3) if 1 and 2 are OK, something else may be wrong with your BAM. You may resort it with newest picard / validate the BAM.

                          4) if all else fails, switch to GATK

                          Comment


                          • #14
                            I had the same problem. Calling the SNPs without BAQ in samtools mpileup fixed the problem (-B option). I have had the best results using the Extended BAQ, brings back the false negatives while reducing false positives as standard BAQ should.

                            Comment


                            • #15
                              Tried the latest GATK's indel realigner but it does seem to do anything to my problem

                              time java -jar GenomeAnalysisTK-2.0-31-gf57127e/GenomeAnalysisTK.jar -nt 6 -T RealignerTargetCreator -R ../exome/human_g1k_v37.fasta -o SRR098401_bwa.intervals -I ../NA12878/SRR098401_bwa.bam -known ../exome/Mills_and_1000G_gold_standard.indels.b37.vcf
                              time java -jar GenomeAnalysisTK-2.0-31-gf57127e/GenomeAnalysisTK.jar -T IndelRealigner -R ../exome/human_g1k_v37.fasta -I ../NA12878/SRR098401_bwa.bam -targetIntervals SRR098401_bwa.intervals -known ../exome/Mills_and_1000G_gold_standard.indels.b37.vcf -o SRR098401_realigned.bam

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X