Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • the pileup result from samtools doesn't mach to read data

    Hi,

    I am trying to obtain base alleles for a specific position (polymorphic location) directly from read data. I also use samtools pileup to obtain bases information. And, the results are not matched.

    Here is an example (the data is from 1000 genome project...low coverage). I look at the sample NA12716 and at the position 154402806.
    These are reads data covering that position. The base at 154402806 is shown in red:

    ERR000573.12285810 pPr1 X 154402756 60 9S42M = 154402571 -226
    GCGAAGAAGTCAATTAGAAAGTCTTTTCAAGTTATCCAAGCAGGAGGTCTC
    27=AA=8A;<?A?@>@>@@@=;>=<>>?A@>@>@:?>AA>?@>??>>=?9?
    X0:i:1 X1:i:0 XC:i:42MD:Z:42 RG:Z:ERR000573 AM:i:37 NM:i:0 SM:i:37 MQ:i:60 XT:A:U
    BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    ERR000569.10381384 pPr1 X 154402769 60 51M = 154402592 -227
    CTTTTCAAGTTATCCAAGCAGGAGGTCTCAAGTGGCCTGGTCTAGAGTAGT
    >6?=8>9>=;1<<:7A@=>$$=@><=%::@@=>;6>><==;>=@>?8>@>?
    X0:i:1 X1:i:0 MD:Z:37C13 RG:Z:ERR000569 AM:i:37 NM:i:1 SM:i:37 MQ:i:60
    XT:A:U
    BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    ERR000569.428911 pPR2 X 154402785 60 35M16S = 154402963 214
    AGCAGGAGGTCTCAAGTGGCCTGGTCTAGAGTAGTGACAGTGGACATTTAA
    ;??@?=>=;;=@>>:<%=;=>@>>>:@%:::?.99=8=?9$<34=6%<#79
    X0:i:1 X1:i:0 XC:i:35MD:Z:21C13 RG:Z:ERR000569 AM:i:37 NM:i:1 SM:i:37 MQ:i:60 XT:A:U
    BQ:Z:@@@@@@@@@@@@@@@@@@@@AC@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    ERR000572.854678 pPr2 X 154402794 60 15S36M = 154402599 -230
    TATCCAAGCAGGAGGTCTCAAGTGGCCTGGTCTAGAGTAGTGACAGTGGAC
    ?A75:@A7:=?:A?>;<>?=@:?>===@>=>=>?;@<?@:@=>?@>A?>A<
    X0:i:1 X1:i:1 XC:i:36MD:Z:12C23 RG:Z:ERR000572 AM:i:23 NM:i:1 SM:i:23 MQ:i:60 XT:A:U
    BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@C@@@@@@@@@@@@@@@@@@@@@@@

    ERR000572.8002077 pPr1 X 154402804 60 15S36M = 154402586 -253
    GGAGGTCTCACGTGGCCTGGTCTAGAGTAGTGACAGTGGACATTGAAAGAA
    =:B<;<;>3?%;=?93=@>=>;>@>@<>@=@==@@=?>>9>@@@=AAA>A?
    X0:i:1 X1:i:0 XC:i:36MD:Z:2C33 RG:Z:ERR000572 AM:i:37 NM:i:1 SM:i:37 MQ:i:60 XT:A:U
    BQ:Z:@@@@@@@@@@@@@@@@@C@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@CD

    ERR000574.8852864 pPR2 X 154402806 60 39M12S = 154402984 222
    TGGTCTAGAGTAGTGACAGTGGACATTGAAAGAAAGAGATTTCAGAGATAC
    =@???@>>>>?>>?>?=A>?>=>=@?@3#@?<8@@<>=:?@A?9<>=>A?<
    X0:i:1 X1:i:0 XC:i:39MD:Z:0C38 RG:Z:ERR000574 AM:i:37 NM:i:1 SM:i:37 MQ:i:60 XT:A:U
    BQ:Z:WJ@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    And the results from pileup is:
    X 154402806 C T 9 67 60 5 tTtt^]T <===&

    So, from the reads data, bases at position 154402806 should be CTTAAT(three reference alleles), but the results from pileup is TTTTT (all are reference alleles). Also, there only 5 reads from pileup results, but it supposes to have 6 reads coverage.

    I checked the default read filter in samtools:
    minimum mapping quality for an alignment to be used [0]
    minmum base quality for a base to be considered [13]

    The base quality of these bases CTTAAT are
    63
    60
    64
    65
    66
    61

    Does anyone have any idea what is going wrong?

    Thank you very much!!

    Anney

  • #2
    Okay, here's your first read, and its cigar string:

    GCGAAGAAGTCAATTAGAAAGTCTTTTCAAGTTATCCAAGCAGGAGGTCTC

    9S42M

    That means the first 9 bases were soft-clipped, then the next 42 match. That means that that last base isn't being counted.

    GGAGGTCTCACGTGGCCTGGTCTAGAGTAGTGACAGTGGACATTGAAAGAA

    15S36M

    The first 15 bases are soft-clipped, so there goes the A. I think you'll find that the alternate bases just aren't being counted, because they aren't deemed as matching well enough.

    Comment


    • #3
      this is your alignment:

      gcgaagaagTCAATTAGAAAGTCTTTTCAAGTTATCCAAGCAGGAGGTCTC-----------------------------------------------------------
      ----------------------CTTTTCAAGTTATCCAAGCAGGAGGTCTCAAGTGGCCTGGTCTAGAGTAGT-------------------------------------
      --------------------------------------AGCAGGAGGTCTCAAGTGGCCTGGTCTAGAGTAGTgacagtggacatttaa---------------------
      --------------------------------tatccaagcaggaggTCTCAAGTGGCCTGGTCTAGAGTAGTGACAGTGGAC---------------------------
      ------------------------------------------ggaggtctcacgtggCCTGGTCTAGAGTAGTGACAGTGGACATTGAAAGAA-----------------
      -----------------------------------------------------------TGGTCTAGAGTAGTGACAGTGGACATTGAAAGAAAGAGAtttcagagatac


      soft clipping actually changes the mapping position.
      but i dont understand the base qualities given in the pileup.

      Comment


      • #4
        Thank you very much for the reply!! This helps me a lot

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Best Practices for Single-Cell Sequencing Analysis
          by seqadmin



          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
          06-06-2024, 07:15 AM
        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 06-07-2024, 06:58 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-06-2024, 08:18 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-06-2024, 08:04 AM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-03-2024, 06:55 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Working...
        X