Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools mpileup finds SNPs on each position

    Hello everyone,

    I am trying to use samtools mpileup. I am using the settings:
    Code:
    samtools mpileup -f ../chr12Ref.fa chr12.bam > chr12.vcf
    When I look in the output, I see this result:
    Code:
    SL2_40ch12      1       C       0
    SL2_40ch12      2       T       0
    SL2_40ch12      3       A       2       .^].    ;=
    SL2_40ch12      4       T       2       ..      ;@
    SL2_40ch12      5       A       2       ..      D@
    SL2_40ch12      6       G       2       ..      BD
    SL2_40ch12      7       C       2       ..      AD
    SL2_40ch12      8       A       3       ..^].   B?@
    SL2_40ch12      9       A       3       ...     DBB
    SL2_40ch12      10      A       4       ...^].  ?:C@
    This means each position has SNP, and the read base is an integer...

    When looking at the input bam file, everything looks fine in IGV. Also the source looks exactly as expected:
    Code:
    @HD     VN:1.3  SO:coordinate
    @SQ     SN:SL2_40ch12   LN:65486253
    @PG     ID:bwa  PN:bwa  VN:0.7.4-r385
    FCD116PACXX:8:1212:1169:72997#GTCCAGAA  117     SL2_40ch12      1       0       *       =       1       0       ATGCATAGACAAGGTCTTGACGGACCTCCACAAACAAATTTGACATTTTTGATGTCAGAATCCGGATCACCCAAAAAATGCTTTGCTATAGCACACAAAA    EEDEEEDDDDCDDCDDDDDDDDC?DDEDFFFFHFHHFHGIJIHFJJJIJJJJJJJJJIJHFIGIJIHGCJJJJJJJJIIIJJJJJJJHHHGHEDDFFCCB
    FCD116PACXX:8:1212:1169:72997#GTCCAGAA  153     SL2_40ch12      1       37      100M    =       1       0       CTATAGCAAACCTTTTTTTGGGTGATTCGGATTCCGATGTCAAAAATGCCAAATTTTTTTGTGGACGCCCGTCAAGACCATGTATATGCATCCGGTTGGC    EEEEDDDDC>@DBDBDDBDDDCDDB@@DDC@DDDBDC@C?BDEDDDDCEEEEEDHJJJJIIEIIGIGGIHCIJIGHFJJJJJJJIJJHHGHHFFFFFCC> XT:A:U  NM:i:0  SM:i:37 AM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100        ZQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@E
    The reference sequence looks like this (so you can see the first reference bases are found by samtools mpileup):
    Code:
    >SL2_40ch12
    CTATAGCAAACCTTTTTTTGGGTGATTCGGATTCCGATGTCAAAAATGCCAAATTTTTTTGTGGACGCCCGTCAAGACCA
    TGTATATGCATCCGGTTGGCCCTCACGGCCCGTCCGACCCATTAAGAAGGTCAAACGAGCCCCGAAGCGAGCATACCCCT
    CATTTCGATCATTTTCGTGTGCTATAGCAAACCATTTTTTGGGTGATCCGGATTTCGACGTCAAAAATGCCAAATTTTTT
    When looking at this data, there has something to be wrong with the bam file, does anyone know what this might be?

  • #2
    I think I have found the problem...
    Samtools mpileup doesn't create a vcf file by default but a file in pileup format.
    Now I am using the -ug option and piping the output to bcftools view.

    Comment


    • #3
      Hi Jetse,
      the mpileup output is simply another way to represent the alignment.
      Consider this output line:
      SL2_40ch12 7 C 2 .. AD
      It means that in chr12 at position 7 you have a base C in your reference genome, In your alignment the coordinate have a depth of 2 and there are not any missmatch (..). AD are the quality values of your sequenced C.

      The following row is the same but the depth is 3
      SL2_40ch12 9 A 3 ... DBB

      If you have a SNP you should have something like this:

      SL2_40ch12 9 A 3 GGG DBB

      That means homozygous SNP AtoG

      Comment


      • #4
        That was indeed my problem, but now I converted this to vcf format by using the -ug option and piping this output to bcftools view. Now my output looks like this:
        Code:
        #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  chr12.bam
        SL2_40ch12      1       .       C       X       0       .       DP=7;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=0.000000,0.000000,0.000000,0.000000 PL      0,0,0
        SL2_40ch12      2       .       T       X       0       .       DP=7;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=-nan,0.000000,0.000000,0.000000     PL      0,0,0
        SL2_40ch12      3       .       A       X       0       .       DP=8;I16=2,0,0,0,54,1460,0,0,89,4441,0,0,6,36,0,0;QS=1.000000,0.000000,0.000000,0.000000        PL      0,6,49
        SL2_40ch12      4       .       T       X       0       .       DP=8;I16=2,0,0,0,57,1637,0,0,89,4441,0,0,8,50,0,0;QS=1.000000,0.000000,0.000000,0.000000        PL      0,6,52
        SL2_40ch12      5       .       A       X       0       .       DP=8;I16=2,0,0,0,66,2186,0,0,89,4441,0,0,10,68,0,0;QS=1.000000,0.000000,0.000000,0.000000       PL      0,6,55
        SL2_40ch12      6       .       G       X       0       .       DP=8;I16=2,0,0,0,68,2314,0,0,89,4441,0,0,12,90,0,0;QS=1.000000,0.000000,0.000000,0.000000       PL      0,6,59
        SL2_40ch12      7       .       C       X       0       .       DP=8;I16=2,0,0,0,67,2249,0,0,89,4441,0,0,14,116,0,0;QS=1.000000,0.000000,0.000000,0.000000      PL      0,6,59
        SL2_40ch12      8       .       A       X       0       .       DP=9;I16=3,0,0,0,94,2950,0,0,149,8041,0,0,16,146,0,0;QS=1.000000,0.000000,0.000000,0.000000     PL      0,9,75
        This is the VCF output I expected already from samtools mpileup, but now I see at each position an X as alternative base...

        [EDIT]
        When looking more into the generated vcf file, I see the SNPs, for instance:
        Code:
        SL2_40ch12      35      .       C       T,X     0       .       DP=17;I16=10,0,1,0,378,14442,24,576,538,30482,60,3600,177,3745,25,625;QS=0.938931,0.061069,0.000000,0.000000;RPB=1.106797e+00   PL      0,9,167,30,170,181
        Here the alternative bases are T,X. When looking in IGV, there is only one read at that position that has a T. So there has to be something in this line which indicates this is probably not a SNP...
        Last edited by Jetse; 06-19-2013, 01:14 AM.

        Comment


        • #5
          Found the answer!

          the problem was with bcftools view. Had to use the options -vcg so I retrieved the output as expected!
          This command gave me this file:
          Code:
          #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  chr12.bam
          SL2_40ch12      241     .       T       TG      8.17    .       INDEL;IS=11,0.733333;DP=15;VDB=3.823388e-02;AF1=1;AC1=2;DP4=0,0,15,0;MQ=54;FQ=-79.5     GT:PL:GQ        1/1:48,45,0:49
          SL2_40ch12      840     .       T       A       222     .       DP=37;VDB=2.932610e-03;AF1=1;AC1=2;DP4=0,0,21,16;MQ=60;FQ=-138  GT:PL:GQ        1/1:255,111,0:99
          SL2_40ch12      841     .       A       G       222     .       DP=37;VDB=2.393675e-03;AF1=1;AC1=2;DP4=0,0,21,16;MQ=60;FQ=-138  GT:PL:GQ        1/1:255,111,0:99

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Advanced Methods for the Detection of Infectious Disease
            by seqadmin




            The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
            ...
            11-27-2023, 01:15 PM
          • seqadmin
            Strategies for Investigating the Microbiome
            by seqadmin




            Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
            11-09-2023, 07:02 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 12-01-2023, 09:55 AM
          0 responses
          15 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 11-30-2023, 10:48 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 11-29-2023, 08:26 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 11-29-2023, 08:12 AM
          0 responses
          15 views
          0 likes
          Last Post seqadmin  
          Working...
          X