Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNPs call

    Hi all- when I am calling SNPs using samtools mpileup following mapping by BWA-

    I get a list with the SNPs however, when I have a look at the bam file on IGV I can see other SNPs (that were not detected by samtools mpileup)- do you think what is the reason for this please?

  • #2
    Can you post a screenshot of IGV? Also, make sure these reads have a high mapping quality and are not secondary alignments or multi-mapped.

    Comment


    • #3
      Havn't used samtools for a pileup but many callers will filter out snps that are represented on only one strand. Is there a "failed snp" output you can compare?
      --Please take everything thing I say with a grain of salt, because, if grad school has taught me anything, it's that I'm an idiot--

      Comment


      • #4
        if I have a file containing all SNPs- how to get some statistics such as snps differences among taxa, unique snps in certain taxa, etc. as it is taking forever to do this manually- is there a perl script please for this purpose

        Comment


        • #5
          hello!
          after a filtered my-raw.bcf file for SNPs i got the file my.var-final.bcf(27.1 MB). and when i open the file it opened ( by using gedit ), its content showed in this way:

          ##fileformat=VCFv4.1
          ##samtoolsVersion=0.1.18 (r982:295)
          ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
          ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
          ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
          ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
          ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
          ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
          ##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
          ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
          ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
          ##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
          ##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
          ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
          ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
          ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
          ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
          ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
          ##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
          ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
          ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
          ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
          ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
          ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
          ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
          ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
          #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT my-sorted.bam
          comp80_c0_seq1 12 . TAAAG T 10.8 . INDEL;DP=26;VDB=0.0000;AF1=0.5;AC1=1;DP4=19,0,4,0;MQ=60;FQ=13.7;PV4=1,1,1,0.49 GT:PL:GQ 0/1:48,0,255:50
          comp904_c0_seq1 30 . G T 73.5 . DP=4;VDB=0.0014;AF1=1;AC1=2;DP4=0,0,4,0;MQ=60;FQ=-39 GT:PL:GQ 1/1:106,12,0:21
          comp904_c0_seq1 37 . C T 52 . DP=4;VDB=0.0014;AF1=1;AC1=2;DP4=0,0,3,0;MQ=60;FQ=-36 GT:PL:GQ 1/1:84,9,0:16
          comp904_c0_seq1 41 . A T 64.3 . DP=6;VDB=0.0020;AF1=1;AC1=2;DP4=0,0,5,0;MQ=60;FQ=-42 GT:PL:GQ

          is there any way i can distinguish SNPs ( and indels if possible )form this file? if any how?
          Last edited by kurban910; 08-08-2014, 11:27 PM.

          Comment


          • #6
            try this:

            bcftools view my-raw.bcf| vcfutils.pl varFilter -d 10 > raw.vcf

            then open raw.vcf in excel and you will see variants as snps or indels

            (you might need to filter snps)

            Comment


            • #7
              Originally posted by mmmm View Post
              try this:

              bcftools view my-raw.bcf| vcfutils.pl varFilter -d 10 > raw.vcf

              then open raw.vcf in excel and you will see variants as snps or indels

              (you might need to filter snps)
              thank you, and i have found the tools i was looking for from here:

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Exploring the Dynamics of the Tumor Microenvironment
                by seqadmin




                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                07-08-2024, 03:19 PM
              • seqadmin
                Exploring Human Diversity Through Large-Scale Omics
                by seqadmin


                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                06-25-2024, 06:43 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:53 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-10-2024, 07:30 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-03-2024, 09:45 AM
              0 responses
              204 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-03-2024, 08:54 AM
              0 responses
              213 views
              0 likes
              Last Post seqadmin  
              Working...
              X