Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNP quality score in Samtools pileup

    Hi,

    I was examining the pileup by Samtools at a particular base of interest:

    X 131016403 G G 103 0 60 53 T$T,,.t.....T.TT,,,..,.t,,...,t,t,tTT,.TT..T..Tt,,T,,t BFGGAGCEFEEE<B-GGGAGGFGGGGGG?GGFGGGGGBEGFGGGEGEFDGGEF

    It looks like a clear heterozygous position with good coverage and decent base qualities, however it got a SNP quality score of 0 and a homozygous genotype call. Is there any possible explanation for this?

    The data are from 75x2 PE reads and alignment was done using ELANDv2. Any help on this will be highly appreciated. Thanks!

  • #2
    Originally posted by wangzkai View Post
    Hi,

    I was examining the pileup by Samtools at a particular base of interest:

    X 131016403 G G 103 0 60 53 T$T,,.t.....T.TT,,,..,.t,,...,t,t,tTT,.TT..T..Tt,,T,,t BFGGAGCEFEEE<B-GGGAGGFGGGGGG?GGFGGGGGBEGFGGGEGEFDGGEF

    It looks like a clear heterozygous position with good coverage and decent base qualities, however it got a SNP quality score of 0 and a homozygous genotype call. Is there any possible explanation for this?

    The data are from 75x2 PE reads and alignment was done using ELANDv2. Any help on this will be highly appreciated. Thanks!
    Maybe the mapping qualities for the variant reads are low?

    Comment


    • #3
      Originally posted by nilshomer View Post
      Maybe the mapping qualities for the variant reads are low?
      This is exactly what I found when I came across the same puzzling situation. Converting the data to BAM format and then visualizing it in IGV showed me that the apparently heterozygous SNP was getting all of its heterozygous bases from the low-quality ends of the reads - the SNP never once showed up in the beginning or middle of a read, only when it was within 7 nt of the end.

      Odd? Yes. But if it were a true SNP, you'd expect to find it in half of the reads regardless of position.

      Comment


      • #4
        Yes that is the problem with SAMtools. The majority of variants are in the 2nd half of the read, hence you have lots of false positives.

        Comment


        • #5
          Does anyone have a code that can print out the positions within each of the reads where a given snp exist?
          Last edited by christophpale; 07-21-2010, 03:11 AM.

          Comment


          • #6
            Is there any more explanation?
            I have found the following contrast examples:
            Code:
            scaffold2410 23912 G S 6 6 37 123 c,,,,,,,,,,,,,,,,,,,,ccc,,cc,,,,,,,,,,,,,ccc,,cc,cccc,,cc,,,,,,,cc,,c,cc,c,,c,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, HHHHHHHHHHHHHHHHHFEHJCCHJEIHHHHHBHHHHHHFHHHHGHHHHHHHHHHCHHHHHHHJFFGJJJJHJHHHHHHHHHHHHHHHHHHHHH<HHHHHGHHHHGHHGHHHGHHGH<GJJIA
            and :
            Code:
            scaffold12030   25942   A       R       37      44      25      44      gggggg,gg,g,,,,,,,ggggggggggggggggggggggggg,    HHHHHHGHHGHEHGHHHHHFHHHHHHHHHHGHBFHHHHHHHHHJ
            Both of these two examples have similar reads quality and mapped on the reverse strand of reference, but with different "SNP quality", how these results produced?

            Anyone who can give me any suggestions will be highly appreciated.

            We have estimated the heterozygosis based on the results that filtered by VarFilter, obviously, we have under estimated the heterozygosis level.
            Last edited by pengchy; 09-21-2011, 06:37 AM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Technologies
              by seqadmin







              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

              Long-Read Sequencing
              Long-read sequencing has...
              Yesterday, 01:49 PM
            • seqadmin
              Genetic Variation in Immunogenetics and Antibody Diversity
              by seqadmin



              The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
              11-06-2024, 07:24 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 09:29 AM
            0 responses
            45 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 09:06 AM
            0 responses
            27 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 08:03 AM
            0 responses
            19 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 11-22-2024, 07:36 AM
            0 responses
            65 views
            0 likes
            Last Post seqadmin  
            Working...
            X