Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ulz_peter
    Senior Member
    • Feb 2010
    • 219

    BWA and Solid data issue

    Hi all,

    We recently got Solid whole exome data and I tried to align it with bwa and do the snp calling with GATK.
    After resolving the CS/CQ-tag issue by a python script to adapt the bam file for analysis with GATK I now ran into another problem:
    Alignment worked well and SNP calling did not show any error messages but: It seems that there is some trouble with the phred quality of the bases. GATK showed me some SNPs with AF=1.00 (so homozygous) which were obviously heterozygous. IGV shows the Phred score for each position and it seems that every base which mathces the reference sequence has phred score 0 whereas the SNPs have a relaitvely high Phred Score (>25).

    Is it possible that bwa changes the base quality scores when converting the csfasta and qual files to the bwa double encoded fastq files? Or is there a special option in the GATK for Solid base-call qualities? Am I doing something completely wrong?

    Any help is appreciateed...
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    It uses the MAQ formula, which is also used by BFAST. See the section called "ABI SOLiD Base Qualities" in the following page for an explanation: http://sourceforge.net/apps/mediawik...apping_Quality. You can confirm by checking the function "cs2nt_nt_qual" in "cs2nt.c" in the BWA source code.

    Comment

    • ulz_peter
      Senior Member
      • Feb 2010
      • 219

      #3
      Hi Nolshomer,

      Thank you for the answer. I checked the resulting SAM-File and it appears that the quality values are distributed nicely within 0 and 60 (substracting 33) but with most of them being at 60. So it seems as bwa does a good job...
      Does anyone know of a switch for the GATK for Solid data? And why does IGV show so many zero quality bases where there shouldn't be too many?

      Comment

      • ulz_peter
        Senior Member
        • Feb 2010
        • 219

        #4
        Sorry for the misspelling of your name Nils :-)

        Comment

        • ulz_peter
          Senior Member
          • Feb 2010
          • 219

          #5
          Just a short update:
          when looking at the details in IGV the OQ (original quality)-Tag shows more or less only '!' (ASCII:33) characters. The CQ-Tag seems to be ok. After converting the duplicate marked, realigned and recalibrated bam file back to SAM Format, it seems that most of the qualitites were changed to '!'. I checked the intermediary bam files (after marking duplicates and realignment) and they showed nice quality values again. So I guess that's an issue of the GATK recalibrator. The weird thing is that the recalibrator obviously does not write the original qualities in the OQ tag...

          Has anyone seen this before?
          Last edited by ulz_peter; 03-24-2011, 11:40 PM.

          Comment

          • Brugger
            Member
            • Mar 2010
            • 21

            #6
            We are using the GATK for post-mapping analysis, but do not recalibrate.

            One problem we have seen if that the solid reads will leak back to the reference if there is one colour error, and this will screw the splits of bases at a snp. It can be so bad that a homozygous snp becomes +30% reference, I have a solution that scrubs away the potentially wrongly corrected bases at SNPs and we are getting nice results with this approach.

            Comment

            • zlu
              Member
              • Nov 2008
              • 34

              #7
              Originally posted by Brugger View Post
              We are using the GATK for post-mapping analysis, but do not recalibrate.

              One problem we have seen if that the solid reads will leak back to the reference if there is one colour error, and this will screw the splits of bases at a snp. It can be so bad that a homozygous snp becomes +30% reference, I have a solution that scrubs away the potentially wrongly corrected bases at SNPs and we are getting nice results with this approach.
              Hi Brugger, do you mind sharing your solution? Does this still involve recalibration using GATK? Thanks.

              Comment

              • aldo
                Junior Member
                • Apr 2008
                • 2

                #8
                Hi Brugger,

                I am also very interested in the approach you have taken. Could you share it with us? Thanks

                Comment

                • Brugger
                  Member
                  • Mar 2010
                  • 21

                  #9
                  My approach looks at the original colours and make some corrections based on the observations. For this to work you need to use a mapper that retains the original colour information like bioscope. Furthermore, I have a patch for bwa that does just this as well.

                  I can see if I can dig out some stats/graphs showing the post mprovement.

                  I have not looked at this for a while, but it was the plan to make it public available when I have some spare time.

                  If you want to test this please send me a PM with your email address and we can sort something out.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    New Genomics Tools and Methods Shared at AGBT 2025
                    by seqadmin


                    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                    The Headliner
                    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                    03-03-2025, 01:39 PM
                  • seqadmin
                    Investigating the Gut Microbiome Through Diet and Spatial Biology
                    by seqadmin




                    The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                    02-24-2025, 06:31 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 05:03 AM
                  0 responses
                  16 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-19-2025, 07:27 AM
                  0 responses
                  15 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-18-2025, 12:50 PM
                  0 responses
                  16 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-03-2025, 01:15 PM
                  0 responses
                  185 views
                  0 reactions
                  Last Post seqadmin  
                  Working...