Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • output format of SAMtool pileup

    Hi,

    I am a newbie of SAM tools. And right now I am using function of "pileup" in SAMtools to identify SNPs in my deep sequencing data. The output is like that.

    chr1 9155441 G T 21 221 60 8 TttttTTT ;KJOLLJJ
    chr1 9162037 A G 9 40 60 9 gGGGGGGGG !B91,0,,!
    chr1 9163797 G R 1 1 60 7 ,.A,,.. eeRedgg
    chr1 9174351 C S 0 11 60 1 g J

    I know the meaning of all the other columns except the ones for column 4, 5, 6, 7. I suspect column 4 tells the mismatch situation between reference genome and my reads, but I don't know why there are letters out of "A/T/G/C". For column 5, 6, and 7, are they some scores? I am not sure. Could someone provide any information about that? Thanks a lot!
    Last edited by SF_mallish; 08-26-2011, 11:34 AM.

  • #2
    Originally posted by SF_mallish View Post
    I suspect column 4 tells the mismatch situation between reference genome and my reads, but I don't know why there are letters out of "A/T/G/C". For column 5, 6, and 7, are they some scores? I am not sure. Could someone provide any information about that? Thanks a lot!
    M,R,W,S,Y,K are commonly used letters that mean "this letter is one of two possible nucleotides". R means "either A or G" S means "either G or C". Pileup will report the consensus like that if it sees two letters at the same locus, and sometimes reference sequences will have those letters. Not sure why pileup is giving you that answer when only one read covers the locus. There's a column for mapping quality, one for coverage depth, one for SNP quality...pileup is deprecated, so learn to use mpileup instead.

    Comment


    • #3
      Originally posted by swbarnes2 View Post
      M,R,W,S,Y,K are commonly used letters that mean "this letter is one of two possible nucleotides". R means "either A or G" S means "either G or C". Pileup will report the consensus like that if it sees two letters at the same locus, and sometimes reference sequences will have those letters. Not sure why pileup is giving you that answer when only one read covers the locus. There's a column for mapping quality, one for coverage depth, one for SNP quality...pileup is deprecated, so learn to use mpileup instead.
      Thanks for the kind answer, swbarnes2!
      I will try mpileup instead.
      So you are also not sure what's the exact meanings of column 5, 6, 7?

      Comment


      • #4
        Originally posted by SF_mallish View Post
        Thanks for the kind answer, swbarnes2!
        I will try mpileup instead.
        So you are also not sure what's the exact meanings of column 5, 6, 7?
        The columns are:
        1 reference_name
        2 pos (1-based coordinate)
        3 ref_base
        4 cons_base
        5 cons_qual
        6 SNP_qual
        7 max_map_qual
        8 cov_depth
        9 read_bases
        10 base_quals

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X