Header Leaderboard Ad

Collapse

output format of SAMtool pileup

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • output format of SAMtool pileup

    Hi,

    I am a newbie of SAM tools. And right now I am using function of "pileup" in SAMtools to identify SNPs in my deep sequencing data. The output is like that.

    chr1 9155441 G T 21 221 60 8 TttttTTT ;KJOLLJJ
    chr1 9162037 A G 9 40 60 9 gGGGGGGGG !B91,0,,!
    chr1 9163797 G R 1 1 60 7 ,.A,,.. eeRedgg
    chr1 9174351 C S 0 11 60 1 g J

    I know the meaning of all the other columns except the ones for column 4, 5, 6, 7. I suspect column 4 tells the mismatch situation between reference genome and my reads, but I don't know why there are letters out of "A/T/G/C". For column 5, 6, and 7, are they some scores? I am not sure. Could someone provide any information about that? Thanks a lot!
    Last edited by SF_mallish; 08-26-2011, 11:34 AM.

  • #2
    Originally posted by SF_mallish View Post
    I suspect column 4 tells the mismatch situation between reference genome and my reads, but I don't know why there are letters out of "A/T/G/C". For column 5, 6, and 7, are they some scores? I am not sure. Could someone provide any information about that? Thanks a lot!
    M,R,W,S,Y,K are commonly used letters that mean "this letter is one of two possible nucleotides". R means "either A or G" S means "either G or C". Pileup will report the consensus like that if it sees two letters at the same locus, and sometimes reference sequences will have those letters. Not sure why pileup is giving you that answer when only one read covers the locus. There's a column for mapping quality, one for coverage depth, one for SNP quality...pileup is deprecated, so learn to use mpileup instead.

    Comment


    • #3
      Originally posted by swbarnes2 View Post
      M,R,W,S,Y,K are commonly used letters that mean "this letter is one of two possible nucleotides". R means "either A or G" S means "either G or C". Pileup will report the consensus like that if it sees two letters at the same locus, and sometimes reference sequences will have those letters. Not sure why pileup is giving you that answer when only one read covers the locus. There's a column for mapping quality, one for coverage depth, one for SNP quality...pileup is deprecated, so learn to use mpileup instead.
      Thanks for the kind answer, swbarnes2!
      I will try mpileup instead.
      So you are also not sure what's the exact meanings of column 5, 6, 7?

      Comment


      • #4
        Originally posted by SF_mallish View Post
        Thanks for the kind answer, swbarnes2!
        I will try mpileup instead.
        So you are also not sure what's the exact meanings of column 5, 6, 7?
        The columns are:
        1 reference_name
        2 pos (1-based coordinate)
        3 ref_base
        4 cons_base
        5 cons_qual
        6 SNP_qual
        7 max_map_qual
        8 cov_depth
        9 read_bases
        10 base_quals

        Comment

        Working...
        X