Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools mpileup finds SNPs on each position

    Hello everyone,

    I am trying to use samtools mpileup. I am using the settings:
    Code:
    samtools mpileup -f ../chr12Ref.fa chr12.bam > chr12.vcf
    When I look in the output, I see this result:
    Code:
    SL2_40ch12      1       C       0
    SL2_40ch12      2       T       0
    SL2_40ch12      3       A       2       .^].    ;=
    SL2_40ch12      4       T       2       ..      ;@
    SL2_40ch12      5       A       2       ..      D@
    SL2_40ch12      6       G       2       ..      BD
    SL2_40ch12      7       C       2       ..      AD
    SL2_40ch12      8       A       3       ..^].   B?@
    SL2_40ch12      9       A       3       ...     DBB
    SL2_40ch12      10      A       4       ...^].  ?:C@
    This means each position has SNP, and the read base is an integer...

    When looking at the input bam file, everything looks fine in IGV. Also the source looks exactly as expected:
    Code:
    @HD     VN:1.3  SO:coordinate
    @SQ     SN:SL2_40ch12   LN:65486253
    @PG     ID:bwa  PN:bwa  VN:0.7.4-r385
    FCD116PACXX:8:1212:1169:72997#GTCCAGAA  117     SL2_40ch12      1       0       *       =       1       0       ATGCATAGACAAGGTCTTGACGGACCTCCACAAACAAATTTGACATTTTTGATGTCAGAATCCGGATCACCCAAAAAATGCTTTGCTATAGCACACAAAA    EEDEEEDDDDCDDCDDDDDDDDC?DDEDFFFFHFHHFHGIJIHFJJJIJJJJJJJJJIJHFIGIJIHGCJJJJJJJJIIIJJJJJJJHHHGHEDDFFCCB
    FCD116PACXX:8:1212:1169:72997#GTCCAGAA  153     SL2_40ch12      1       37      100M    =       1       0       CTATAGCAAACCTTTTTTTGGGTGATTCGGATTCCGATGTCAAAAATGCCAAATTTTTTTGTGGACGCCCGTCAAGACCATGTATATGCATCCGGTTGGC    EEEEDDDDC>@DBDBDDBDDDCDDB@@DDC@DDDBDC@C?BDEDDDDCEEEEEDHJJJJIIEIIGIGGIHCIJIGHFJJJJJJJIJJHHGHHFFFFFCC> XT:A:U  NM:i:0  SM:i:37 AM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100        ZQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@E
    The reference sequence looks like this (so you can see the first reference bases are found by samtools mpileup):
    Code:
    >SL2_40ch12
    CTATAGCAAACCTTTTTTTGGGTGATTCGGATTCCGATGTCAAAAATGCCAAATTTTTTTGTGGACGCCCGTCAAGACCA
    TGTATATGCATCCGGTTGGCCCTCACGGCCCGTCCGACCCATTAAGAAGGTCAAACGAGCCCCGAAGCGAGCATACCCCT
    CATTTCGATCATTTTCGTGTGCTATAGCAAACCATTTTTTGGGTGATCCGGATTTCGACGTCAAAAATGCCAAATTTTTT
    When looking at this data, there has something to be wrong with the bam file, does anyone know what this might be?

  • #2
    I think I have found the problem...
    Samtools mpileup doesn't create a vcf file by default but a file in pileup format.
    Now I am using the -ug option and piping the output to bcftools view.

    Comment


    • #3
      Hi Jetse,
      the mpileup output is simply another way to represent the alignment.
      Consider this output line:
      SL2_40ch12 7 C 2 .. AD
      It means that in chr12 at position 7 you have a base C in your reference genome, In your alignment the coordinate have a depth of 2 and there are not any missmatch (..). AD are the quality values of your sequenced C.

      The following row is the same but the depth is 3
      SL2_40ch12 9 A 3 ... DBB

      If you have a SNP you should have something like this:

      SL2_40ch12 9 A 3 GGG DBB

      That means homozygous SNP AtoG

      Comment


      • #4
        That was indeed my problem, but now I converted this to vcf format by using the -ug option and piping this output to bcftools view. Now my output looks like this:
        Code:
        #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  chr12.bam
        SL2_40ch12      1       .       C       X       0       .       DP=7;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=0.000000,0.000000,0.000000,0.000000 PL      0,0,0
        SL2_40ch12      2       .       T       X       0       .       DP=7;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=-nan,0.000000,0.000000,0.000000     PL      0,0,0
        SL2_40ch12      3       .       A       X       0       .       DP=8;I16=2,0,0,0,54,1460,0,0,89,4441,0,0,6,36,0,0;QS=1.000000,0.000000,0.000000,0.000000        PL      0,6,49
        SL2_40ch12      4       .       T       X       0       .       DP=8;I16=2,0,0,0,57,1637,0,0,89,4441,0,0,8,50,0,0;QS=1.000000,0.000000,0.000000,0.000000        PL      0,6,52
        SL2_40ch12      5       .       A       X       0       .       DP=8;I16=2,0,0,0,66,2186,0,0,89,4441,0,0,10,68,0,0;QS=1.000000,0.000000,0.000000,0.000000       PL      0,6,55
        SL2_40ch12      6       .       G       X       0       .       DP=8;I16=2,0,0,0,68,2314,0,0,89,4441,0,0,12,90,0,0;QS=1.000000,0.000000,0.000000,0.000000       PL      0,6,59
        SL2_40ch12      7       .       C       X       0       .       DP=8;I16=2,0,0,0,67,2249,0,0,89,4441,0,0,14,116,0,0;QS=1.000000,0.000000,0.000000,0.000000      PL      0,6,59
        SL2_40ch12      8       .       A       X       0       .       DP=9;I16=3,0,0,0,94,2950,0,0,149,8041,0,0,16,146,0,0;QS=1.000000,0.000000,0.000000,0.000000     PL      0,9,75
        This is the VCF output I expected already from samtools mpileup, but now I see at each position an X as alternative base...

        [EDIT]
        When looking more into the generated vcf file, I see the SNPs, for instance:
        Code:
        SL2_40ch12      35      .       C       T,X     0       .       DP=17;I16=10,0,1,0,378,14442,24,576,538,30482,60,3600,177,3745,25,625;QS=0.938931,0.061069,0.000000,0.000000;RPB=1.106797e+00   PL      0,9,167,30,170,181
        Here the alternative bases are T,X. When looking in IGV, there is only one read at that position that has a T. So there has to be something in this line which indicates this is probably not a SNP...
        Last edited by Jetse; 06-19-2013, 01:14 AM.

        Comment


        • #5
          Found the answer!

          the problem was with bcftools view. Had to use the options -vcg so I retrieved the output as expected!
          This command gave me this file:
          Code:
          #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  chr12.bam
          SL2_40ch12      241     .       T       TG      8.17    .       INDEL;IS=11,0.733333;DP=15;VDB=3.823388e-02;AF1=1;AC1=2;DP4=0,0,15,0;MQ=54;FQ=-79.5     GT:PL:GQ        1/1:48,45,0:49
          SL2_40ch12      840     .       T       A       222     .       DP=37;VDB=2.932610e-03;AF1=1;AC1=2;DP4=0,0,21,16;MQ=60;FQ=-138  GT:PL:GQ        1/1:255,111,0:99
          SL2_40ch12      841     .       A       G       222     .       DP=37;VDB=2.393675e-03;AF1=1;AC1=2;DP4=0,0,21,16;MQ=60;FQ=-138  GT:PL:GQ        1/1:255,111,0:99

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Exploring the Dynamics of the Tumor Microenvironment
            by seqadmin




            The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
            07-08-2024, 03:19 PM
          • seqadmin
            Exploring Human Diversity Through Large-Scale Omics
            by seqadmin


            In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
            06-25-2024, 06:43 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 07-10-2024, 07:30 AM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-03-2024, 09:45 AM
          0 responses
          201 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-03-2024, 08:54 AM
          0 responses
          211 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-02-2024, 03:00 PM
          0 responses
          193 views
          0 likes
          Last Post seqadmin  
          Working...
          X