Header Leaderboard Ad


Odd characters in samtools mpileup output



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Odd characters in samtools mpileup output

    I'm struggling to figure out what some of the characters in my samtools mpileup output are. Here's one of the offending bases (scaffold1:25513). There are many.

    scaffold1       25513   G       20      <<,,,,,a,A,aa,a,AA..    #A.CCFG6G6F67E:F<6GF
    So it tells me that the read depth is 20, and this is confirmed by counting the number of characters in each of the last two columns. But I have absolutely no idea what the "<" character represents in the read_bases column (column #5).

    The only special characters I'm expecting to see are '.' and ',' (indicating forward and reverse matches) '+' and '-' (indicating indels), and '^' (followed by a symbol indicating read-mapping quality) and '$' (indicating the beginning and end of a read respectively).

    So can anyone tell me what '<' means in column 5?

    EDIT: To answer my own question somewhat, '<' and '>' indicate a "reference skip" according to the mpileup documentation. (Although they are not mentioned in the pileup format documentation, which is why I couldn't find them.) However, I have absolutely no idea what "reference skip" means, so I'm still out of luck. If it's referring to a base that is not covered (e.g. due to splicing) then shouldn't the coverage ideally be reported as 18, not 20?
    Last edited by Bueller_007; 08-26-2011, 04:52 PM.