Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNPs call

    Hi all- when I am calling SNPs using samtools mpileup following mapping by BWA-

    I get a list with the SNPs however, when I have a look at the bam file on IGV I can see other SNPs (that were not detected by samtools mpileup)- do you think what is the reason for this please?

  • #2
    Can you post a screenshot of IGV? Also, make sure these reads have a high mapping quality and are not secondary alignments or multi-mapped.

    Comment


    • #3
      Havn't used samtools for a pileup but many callers will filter out snps that are represented on only one strand. Is there a "failed snp" output you can compare?
      --Please take everything thing I say with a grain of salt, because, if grad school has taught me anything, it's that I'm an idiot--

      Comment


      • #4
        if I have a file containing all SNPs- how to get some statistics such as snps differences among taxa, unique snps in certain taxa, etc. as it is taking forever to do this manually- is there a perl script please for this purpose

        Comment


        • #5
          hello!
          after a filtered my-raw.bcf file for SNPs i got the file my.var-final.bcf(27.1 MB). and when i open the file it opened ( by using gedit ), its content showed in this way:

          ##fileformat=VCFv4.1
          ##samtoolsVersion=0.1.18 (r982:295)
          ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
          ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
          ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
          ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
          ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
          ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
          ##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
          ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
          ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
          ##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
          ##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
          ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
          ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
          ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
          ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
          ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
          ##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
          ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
          ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
          ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
          ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
          ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
          ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
          ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
          #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT my-sorted.bam
          comp80_c0_seq1 12 . TAAAG T 10.8 . INDEL;DP=26;VDB=0.0000;AF1=0.5;AC1=1;DP4=19,0,4,0;MQ=60;FQ=13.7;PV4=1,1,1,0.49 GT:PL:GQ 0/1:48,0,255:50
          comp904_c0_seq1 30 . G T 73.5 . DP=4;VDB=0.0014;AF1=1;AC1=2;DP4=0,0,4,0;MQ=60;FQ=-39 GT:PL:GQ 1/1:106,12,0:21
          comp904_c0_seq1 37 . C T 52 . DP=4;VDB=0.0014;AF1=1;AC1=2;DP4=0,0,3,0;MQ=60;FQ=-36 GT:PL:GQ 1/1:84,9,0:16
          comp904_c0_seq1 41 . A T 64.3 . DP=6;VDB=0.0020;AF1=1;AC1=2;DP4=0,0,5,0;MQ=60;FQ=-42 GT:PL:GQ

          is there any way i can distinguish SNPs ( and indels if possible )form this file? if any how?
          Last edited by kurban910; 08-08-2014, 11:27 PM.

          Comment


          • #6
            try this:

            bcftools view my-raw.bcf| vcfutils.pl varFilter -d 10 > raw.vcf

            then open raw.vcf in excel and you will see variants as snps or indels

            (you might need to filter snps)

            Comment


            • #7
              Originally posted by mmmm View Post
              try this:

              bcftools view my-raw.bcf| vcfutils.pl varFilter -d 10 > raw.vcf

              then open raw.vcf in excel and you will see variants as snps or indels

              (you might need to filter snps)
              thank you, and i have found the tools i was looking for from here:

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advanced Methods for the Detection of Infectious Disease
                by seqadmin




                The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
                ...
                11-27-2023, 01:15 PM
              • seqadmin
                Strategies for Investigating the Microbiome
                by seqadmin




                Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
                11-09-2023, 07:02 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 10:48 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 08:26 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 08:12 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-27-2023, 08:12 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Working...
              X