Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ForeignMan
    Member
    • Jun 2010
    • 20

    Generate subset from BAM format

    Greetings everyone,

    I have a quite large BAM-File after an alignment with bwa. My question is pretty general:
    Is there a tool to extract some reads (by read name or row index) from a BAM-file?

    I already did this once on the fastq files, but it would be really helpful to work directly on the BAM-file so I don't have to do all the preceding steps (alignment, conversion from SAM to BAM, sorting, removing of duplicate reads etc.) again.
    I could not find any tool yet, that is able to read and write BAM-file. Does anyone have experience with Rsamtools, a Bioconductor package to read BAM-files into R? I couldn't find any way to export the data from R back into a BAM-file. If Rsamtools could write BAM-files it would be perfect. I'm looking for something similar.
    I would be very great if someone knows such a tool or can give any advice on how to get a subset of my aligned reads without having to repeat the whole alignment procedure and post-processing.

    Thanks in advance.
  • ffinkernagel
    Senior Member
    • Oct 2009
    • 110

    #2
    If you want scripting, PySam works well (enough) from python and reads/writes both sam and bam files, and you can filter on whatever you want.

    Otherwise 'samtools' with samtools view my_file | grep readname should do for a quick job.

    Rsamtools reads bam, but on large files (60M aligned reads) I had it die with 'no negative indices allowed' or such.

    Comment

    • ForeignMan
      Member
      • Jun 2010
      • 20

      #3
      Thank you very much for your answer!

      Didn't know PySam yet, but I might give it a try, although I got no experience with python.
      Your tip concerning samtools view with grep will work only for sam output, right? That would be a compromise, but working just with BAM-files would be the best, because I wouldn't need to convert SAM to BAM again. I've got Illumina paired-end sequencing data, and calling bwa sampe also takes quite some time. My problem is, that want to select a subset several times (maybe 10 times or more). Thus, I try to save as much calculations as possible. And in general, I'm interested in tools to directly manipulate BAM-files.
      But your tips sound good, and I guess I will work on SAM-files this time if I can't find another way.

      Concerning Rsamtools: I remember I had the same error once, too. But, if I remember it right, it occurred only when reading specific (or all) columns. I was able to read a >1GB BAM-file when I set the parameter to only read qname, flag, or position for example. But that's a serious problem and wouldn't help me here, when I want to read my whole BAM-file (its size is >2GB). Thanks for reminding me of that.

      Comment

      • ffinkernagel
        Senior Member
        • Oct 2009
        • 110

        #4
        samtools: They read (and write) BAM as well.

        RSamtools: Filtering the columns was not enough - my BAM is ~2.4 GB.

        Comment

        • Joker!sAce
          Member
          • Feb 2011
          • 21

          #5
          Hi,

          I'm working on something similar, almost identical. You could try this script to get all the aligned reads. Pysam can read/ write to SAM files.
          Here are two (Documentation 1, Documentation 2 ) links that can help you.

          import pysam
          samfile = pysam.Samfile( "NA06984.454.MOSAIK.SRP000033.2009_11.bam", "rb" )

          for alignedread in samfile.fetch():
          print alignedread
          This is a very simple script to get all the aligned reads.
          "NA06984.454.MOSAIK.SRP000033.2009_11.bam" was the name of my BAM file.

          Also, Could I have a look at what your data looks like, one or two records would do. I have a free week, I could write a small script that would do the trick.

          Best,
          Joker!sAce
          Last edited by Joker!sAce; 02-28-2011, 07:43 AM.

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Today, 06:09 AM
          0 responses
          15 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          34 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          39 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          47 views
          0 reactions
          Last Post SEQadmin2  
          Working...