Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to fix bwa and ssaha2 misalignments?

    At the 98th and 99th codon of HLA-DQA1 gene in the reference genome, NA12878 should be ATCATG (reference) and AGTCTG respectively for the two chromomsomes. However, when I looked at the bams (downloaded from 1000g) in IGV, I noticed that both bwa and ssaha2 added an insertion and a deletion to the alignment:

    A-TCATG
    AGTC-TG

    How should I fix this? Increase the gap penalty and re-run the alignment? Is there a way to manually edit the alignment inside bam?
    Attached Files

  • #2
    Fixed this by manually edit the sam file. Anyone know if this can be fixed by adjusting the gap penalty?

    Comment


    • #3
      I tried GSNAP with SNP-tolerant on. However, I find that many reads trimmed this part for unknown reasons such that the coverage on the 98th and 99th codon in the alternate allele is very low.

      Comment


      • #4
        Ran bwa again with gap opening penalty increases from 11 to 15. But I still have those insertion-deletion alignments. What should I try next?

        Comment


        • #5
          You might try running a local realignment program such as SRMA.

          Comment


          • #6
            What's more likely, three SNPs in a row, or two 1bp indels and two matches?

            The inequality you want to satisfy 3*-MM < 2*M + (-O + -E).
            M = 1
            MM = -3
            O = -5
            E = -2

            I use these as the default parameters in TMAP.

            Comment


            • #7
              Originally posted by nilshomer View Post
              What's more likely, three SNPs in a row, or two 1bp indels and two matches?

              The inequality you want to satisfy 3*-MM < 2*M + (-O + -E).
              M = 1
              MM = -3
              O = -5
              E = -2

              I use these as the default parameters in TMAP.
              By default, bwa aln has M=1, MM=-3, O=-11, E=-4. It seems to me the inequality should be in favor of having three mismatches but in practice it is not. What's going on here?

              Comment


              • #8
                Originally posted by gaffa View Post
                You might try running a local realignment program such as SRMA.
                Is this an abandoned project? The binary is dated 2010-10-22

                Comment


                • #9
                  Originally posted by ymc View Post
                  Is this an abandoned project? The binary is dated 2010-10-22

                  Source on github seems to be newer, but still 1 year old:

                  Short-read Micro-Aligner. Contribute to nh13/SRMA development by creating an account on GitHub.

                  Comment


                  • #10
                    Originally posted by gaffa View Post
                    You might try running a local realignment program such as SRMA.
                    Oh well, I am getting hundreds of this arraylist error after 145min of srma run....

                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)
                    at java.util.ArrayList$SubList.add(ArrayList.java:965)

                    real 145m13.649s
                    user 177m44.719s
                    sys 3m20.097s

                    Comment


                    • #11
                      Originally posted by darked89 View Post
                      Source on github seems to be newer, but still 1 year old:

                      https://github.com/nh13/SRMA
                      Thanks. I am trying this slightly newer 0.1.16 version now

                      Comment


                      • #12
                        SRMA 0.1.16 also crashes

                        I tried "bwa aln -M 2" as well but the problem remains.

                        Comment


                        • #13
                          Originally posted by ymc View Post
                          SRMA 0.1.16 also crashes

                          I tried "bwa aln -M 2" as well but the problem remains.
                          Hard to tell why. Few ideas:

                          1) Just in case: you are using sorted and indexed BAM as SRMA input?

                          2) Try to get some small test BAM file which is known to run OK with SRMA and then check if your local SRMA works on them.

                          3) if 1 and 2 are OK, something else may be wrong with your BAM. You may resort it with newest picard / validate the BAM.

                          4) if all else fails, switch to GATK

                          Comment


                          • #14
                            I had the same problem. Calling the SNPs without BAQ in samtools mpileup fixed the problem (-B option). I have had the best results using the Extended BAQ, brings back the false negatives while reducing false positives as standard BAQ should.

                            Comment


                            • #15
                              Tried the latest GATK's indel realigner but it does seem to do anything to my problem

                              time java -jar GenomeAnalysisTK-2.0-31-gf57127e/GenomeAnalysisTK.jar -nt 6 -T RealignerTargetCreator -R ../exome/human_g1k_v37.fasta -o SRR098401_bwa.intervals -I ../NA12878/SRR098401_bwa.bam -known ../exome/Mills_and_1000G_gold_standard.indels.b37.vcf
                              time java -jar GenomeAnalysisTK-2.0-31-gf57127e/GenomeAnalysisTK.jar -T IndelRealigner -R ../exome/human_g1k_v37.fasta -I ../NA12878/SRR098401_bwa.bam -targetIntervals SRR098401_bwa.intervals -known ../exome/Mills_and_1000G_gold_standard.indels.b37.vcf -o SRR098401_realigned.bam

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM
                              • seqadmin
                                Multiomics Techniques Advancing Disease Research
                                by seqadmin


                                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                                A major leap in the field has
                                ...
                                02-08-2024, 06:33 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:12 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-23-2024, 04:11 PM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-21-2024, 08:52 AM
                              0 responses
                              73 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-20-2024, 08:57 AM
                              0 responses
                              62 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X