Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pileup shows variants - Not called as SNPs

    Hello,

    I notice that with samtools pileup/mpileup, there are several positions which have variants that can be classified as SNPs. However, they fail to show up when using the subsequent SNP program in its most basic form (i.e. without any filters or flags). What can be causing these 'calls' to drop off?

    Thanks,
    BertieW

  • #2
    Maybe you could post some examples?

    Comment


    • #3
      First guess, the quality of the alternate letters is too low. Post the line of the mpileup that you think is a putative SNP, and see if that's the case.

      Comment


      • #4
        I have a similar situation with several SNPs. I filtered out reads that align to this SNP position to investigate the mapping quality and base qualities. All reads aligning to the position where I observe this SNP on IGV have a mapping quality of 37 which is pretty good. All of these reads match perfectly without mismatch to the reference and a majority of these reads (>40) differ at this position compared to reference. But Samtools pileup does not call it a SNP. Is it just the limitation of samtools or is there something I could try to overcome this? The SNP I observe is at position 33388713
        Code:
        328MB44S:8:42:10743:11542#0 0 1 33388674 37 70M * 0 0 CCTTAGGTCACACTGCAGAACAGAGTGATATTGCAAAAAACCAAATTATCAGTCTACGTGCAACCACCAT gcghhhffdhhghfhhhhhhhhhfhehfhhhghghhgcfhghghhhhghhhhgdfhhhhhhffheahhhf XT:A:U NM:i:1 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:39G30

        Comment


        • #5
          Looks like your base qualities are in Illumina's phred+64 format (see http://en.wikipedia.org/wiki/FASTQ_format). They should in Sanger format. Since the base qualities are out of range, all bets are off.

          Comment


          • #6
            Thanks Nils. I am redoing the workflow with phred+33 qualities now. However, I am thinking what difference should I expect? It looks like with phred+64, I was overestimating the base quality and hence I was seeing lot many SNPs and indels than expected. With the corrected base qualities, this may go down. I wonder how I can expect this fix to enable the detection of the SNP that samtools had missed earlier. Any thoughts?

            Comment


            • #7
              Post your results when you are done.

              Comment


              • #8
                BWA aln has a -I option if your fastq files are in Illumina format rather than Sanger. If this option is used are quality scores changed to phred+33 in the output?

                Comment


                • #9
                  I have a similar question and I hope someone can help me with this. If you need more info, just let me know.

                  I aligned my 2 x 100 bp reads with BWA - I and applied realignment with GATK in a later stage and looking at the results in IGV, everything looks quite good. There are some problems though.

                  The bam files of two family members show an AC-deletion at chr1:165,378,729 and that one is being called. At position 165,378,733 and 165,378,735 I find a C > T variants that are supported by 62 out of 62 and 58 out of 68 reads in one patient and by 36 out of 42 and 22 out of 42 reads in the second patient. Seems like clear snp's to me and the base call quality is definitely in SANGER format and between 30 and 40.

                  Does samtools allow snp's near indels or could the problem be somewhere else? (or maybe just because it is Monday)

                  Thank you!

                  EDIT: I know that both the deletion and the C>T snp are actual known variants (dbSNP132), although I can imagine it to be a bit complex, because they are in a repetitive region/simple repeats.
                  Last edited by Papillon; 05-09-2011, 11:49 AM.

                  Comment


                  • #10
                    Similar Problem

                    I have a similar problem and don't want to start a separate thread unless necessary.

                    Illumina HiSeq paired-end genomic DNA for Arabidopsis.
                    Reads aligned using BWA (aln and sampe modules).
                    Alignment is sorted and indexed.

                    I used Samtools mpileup:
                    samtools mpileup -uf TAIR10.fa arab1.sorted.bam | bcftools view -c - > abar1.var.raw.vcf

                    When I compare a particular region: Chr5:4,764,955 between the output of mpileup & bcftools with the samtools tview -- I don't see the same results. I will post here if possible:

                    from the vcf file output:
                    Chr5 4763954 . T . 45 . DP=65;AF1=4.92e-07;CI95=1.5,0;DP4=1,4,0,0;MQ=54 PL 0
                    Chr5 4763955 . C . 3.67 . DP=64;AF1=1;CI95=0.5,1;DP4=0,0,0,3;MQ=60 PL 40
                    Chr5 4763956 . C . 42 . DP=64;AF1=4.837e-07;CI95=1.5,0;DP4=1,3,0,0;MQ=60 PL 0

                    The tview "view" is attached. I'm not sure why, with 64 reads, ALL with a T at position 4763955, that it is not called as a SNP. I'm trying to assess why? Is it the gap nearby?

                    Please note that this "gap" inserted into the reference is caused by a single low-quality read (not shown in the png file). How can I get BWA or mpileup to not insert such low quality inserts ??

                    Many Thanks !

                    Jim
                    Attached Files

                    Comment


                    • #11
                      Do'h, do I feel stupid!

                      I am also using SAMTools' varFilter to loose the variants which have too low coverage. But it actually does much more and this is why I am loosing these SNP's.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-25-2024, 11:49 AM
                      0 responses
                      20 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-24-2024, 08:47 AM
                      0 responses
                      20 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      62 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      61 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X