Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Graham Etherington
    Member
    • Apr 2010
    • 22

    Samtools Pileup Parser

    Hi,
    I'm trying to parse the Samtools Pileup format in Perl.

    I've created a paired-end assembly using BWA (aln, sampe) and then used Samtools to create a pileup (using import - sort - index - pileup).

    What I want to do is to open up the pileup (which is obviously in the Samtools Pileup format) and iterate over each position of the pileup, examining differences in base-calls.

    Does anyone know of anything available, or have and sample code they could post?
    Many thanks.
  • Pepe
    Member
    • Mar 2009
    • 30

    #2
    You cannot open a file in Pileup format with Bio:B::sam

    You should convert your SAM file to BAM format (samtools view) and sort it (samtools sort), maybe index it too? I'm not sure (samtools index).

    Then follow the instructions in the CPAN website, which are very, veryhelpful. Basically:

    use Bio:B::Sam;
    my $sam = Bio:B::Sam->new(-bam => "my_sorted_bam_file.bam",
    -fasta => "my_ref.fasta");
    my @targets = $sam->seq_ids;
    foreach my $chr (@targets){
    $sam->pileup($chr,$my_subroutine);
    }

    where $my_subroutine is a subroutine that you'll need to get to do what you want: snp calling, coverage calculation...

    Comment

    • Graham Etherington
      Member
      • Apr 2010
      • 22

      #3
      Originally posted by Pepe View Post
      You cannot open a file in Pileup format with Bio:B::sam

      You should convert your SAM file to BAM format (samtools view) and sort it (samtools sort), maybe index it too? I'm not sure (samtools index).

      Then follow the instructions in the CPAN website, which are very, veryhelpful. Basically:

      use Bio:B::Sam;
      my $sam = Bio:B::Sam->new(-bam => "my_sorted_bam_file.bam",
      -fasta => "my_ref.fasta");
      my @targets = $sam->seq_ids;
      foreach my $chr (@targets){
      $sam->pileup($chr,$my_subroutine);
      }

      where $my_subroutine is a subroutine that you'll need to get to do what you want: snp calling, coverage calculation...
      Hi,
      Thanks for your answer. I'd realised that Bio:B:Sam couldn't parse the pileup, so this is why I am asking if anyone has, or knows about, anything that could parse it, before I spent time writing a Samtools parser myself.
      I have a 'Parser.pm' module that parses various pileup formats (maq, bowtie, novoalign, etc), but I haven't added one for Samtools format yet.
      Cheers,
      Graham

      Comment

      • nekrut
        Member
        • Apr 2009
        • 22

        #4
        Galaxy's (http://usegalaxy.org) pileup parser might do the trick:

        Comment

        • Graham Etherington
          Member
          • Apr 2010
          • 22

          #5
          Originally posted by nekrut View Post
          Galaxy's (http://usegalaxy.org) pileup parser might do the trick:
          http://bit.ly/cXtqD9
          Thanks. That should give me enough to get started with.
          Best wishes,
          Graham

          Comment

          • cschu
            Junior Member
            • Aug 2012
            • 1

            #6
            I noticed that the referenced Galaxy code does not check for the '>' and '<' characters. Can anybody please provide some help how to process those? I.e., I don't really understand what "reference skip" means and would think that such a symbol means that the read does not cover the the respective position. Any help would be appreciated, thanks!

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              07-01-2026, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 07-02-2026, 11:08 AM
            0 responses
            12 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            14 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            20 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            54 views
            0 reactions
            Last Post SEQadmin2  
            Working...