Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools Pileup Parser

    Hi,
    I'm trying to parse the Samtools Pileup format in Perl.

    I've created a paired-end assembly using BWA (aln, sampe) and then used Samtools to create a pileup (using import - sort - index - pileup).

    What I want to do is to open up the pileup (which is obviously in the Samtools Pileup format) and iterate over each position of the pileup, examining differences in base-calls.

    Does anyone know of anything available, or have and sample code they could post?
    Many thanks.

  • #2
    You cannot open a file in Pileup format with Bio:B::sam

    You should convert your SAM file to BAM format (samtools view) and sort it (samtools sort), maybe index it too? I'm not sure (samtools index).

    Then follow the instructions in the CPAN website, which are very, veryhelpful. Basically:

    use Bio:B::Sam;
    my $sam = Bio:B::Sam->new(-bam => "my_sorted_bam_file.bam",
    -fasta => "my_ref.fasta");
    my @targets = $sam->seq_ids;
    foreach my $chr (@targets){
    $sam->pileup($chr,$my_subroutine);
    }

    where $my_subroutine is a subroutine that you'll need to get to do what you want: snp calling, coverage calculation...

    Comment


    • #3
      Originally posted by Pepe View Post
      You cannot open a file in Pileup format with Bio:B::sam

      You should convert your SAM file to BAM format (samtools view) and sort it (samtools sort), maybe index it too? I'm not sure (samtools index).

      Then follow the instructions in the CPAN website, which are very, veryhelpful. Basically:

      use Bio:B::Sam;
      my $sam = Bio:B::Sam->new(-bam => "my_sorted_bam_file.bam",
      -fasta => "my_ref.fasta");
      my @targets = $sam->seq_ids;
      foreach my $chr (@targets){
      $sam->pileup($chr,$my_subroutine);
      }

      where $my_subroutine is a subroutine that you'll need to get to do what you want: snp calling, coverage calculation...
      Hi,
      Thanks for your answer. I'd realised that Bio:B:Sam couldn't parse the pileup, so this is why I am asking if anyone has, or knows about, anything that could parse it, before I spent time writing a Samtools parser myself.
      I have a 'Parser.pm' module that parses various pileup formats (maq, bowtie, novoalign, etc), but I haven't added one for Samtools format yet.
      Cheers,
      Graham

      Comment


      • #4
        Galaxy's (http://usegalaxy.org) pileup parser might do the trick:

        Comment


        • #5
          Originally posted by nekrut View Post
          Galaxy's (http://usegalaxy.org) pileup parser might do the trick:
          http://bit.ly/cXtqD9
          Thanks. That should give me enough to get started with.
          Best wishes,
          Graham

          Comment


          • #6
            I noticed that the referenced Galaxy code does not check for the '>' and '<' characters. Can anybody please provide some help how to process those? I.e., I don't really understand what "reference skip" means and would think that such a symbol means that the read does not cover the the respective position. Any help would be appreciated, thanks!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X