Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regarding MergeBam tool of Picard

    hi all,
    A seemingly simple task of merging BAMs has flummoxed me. Picard MergeBamAlignment module has a mandatory option of UNAMAPPED_BAM file.

    I am using bwa aln and it doesn't return separately any unmapped file. I couldn't understand this parameter. Is this module supposed to be used only if one has unmapped BAMs too.

    Anyways, another Picard module: MergeSamFiles, says it can merge BAMs too and doesn't require unmapped BAMs.

    I spent so much time understanding the former; thought anyone else looking at these modules should be aware too.

  • #2
    From Picard MergeBamAlignment--
    UNMAPPED_BAM - Original SAM or BAM file of unmapped reads, which must be in queryname order.

    You must give MergeBamAlignment the original sequences you aligned using bwa aln in BAM or SAM format, sorted by read ID.

    I haven't used this command before, and it's not clear to me what purpose it serves. I'm sure these is a reason for it, I just haven't come across it yet.

    In any case, if you have multiple output BAMs that you'd like to merge post-alignment, perhaps the best choice is MergeSamFiles. If I were you, I'd add a read group to each file separately before merging them. You can do that using Picard's AddOrReplaceReadGroups.
    Last edited by MBekritsky; 08-25-2014, 08:40 AM. Reason: Clarification

    Comment


    • #3
      hi MBekritsky,
      Its not clear as UNMAPPED_BAM is asking for a BAM file and not original sequence file which would be a fastq. The aligner which I know that writes out separate BAM for unmapped seq. is TopHat (Not sure of the recent versions though).

      Ya, I did passed the readgroup info at the bwa sampe stage. There were multiple lanes and wanted to combine the final BAMs.

      thanks

      Comment


      • #4
        Hi amitm,

        I regularly deal with unaligned reads in BAM format. It's a little odd, since BAM is meant to store alignment information, but it's not uncommon, since it allows you to store read pairs in the same file and compresses the data. You can get from FASTQ to BAM (or SAM) format using Picard's FastqToSam.

        The line regarding what UNMAPPED_BAM requires is a direct quote from the Picard website. It seems to be simply asking for the unmapped reads initially passed to BWA for alignment. I don't see the same ambiguity that you do, but I also haven't been spending as much time on it as you do, so I will defer to you.

        Lastly, if all you want to do is combine the final BAMs, why not use MergeSamFiles? That works very well and doesn't have the requirement for UNMAPPED_BAM as an input.

        Comment


        • #5
          Picard's MergeBamAlignment tool is poorly named and I doubt you're the first person to wonder about this. A better name would have been IncorporateUnmapped.

          Comment


          • #6
            So it's for use with alignment algorithms that don't report unmapped reads?

            Comment


            • #7
              There are a couple uses (that I know of at least). Firstly, it can be convenient to merge in unmapped reads and then just archive the resulting BAM file rather than archiving a BAM file having only mapped reads and a fastq file. Secondly, some downstream tools (particularly RNAseqQC) need to know the total number of initial reads you had in order to produce correct output.

              BTW, the description in the original email mentioning this tools is informative here:

              - MergeBamAlignment - Tool to take a Sam or Bam file of unmapped
              reads and merge it with a Sam or Bam file that contains alignment
              information for a subset of those reads, retaining all metadata from the
              unmapped file.
              So originally some of the developers were using BAM files to hold unmapped reads (this can be convenient) and then wanted to merge back in the unmapped reads. Yeah, it'd make sense to just have the aligner spit the unmapped reads out, but that's not always an option (really, who has the time to muck with every aligner's code).

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-25-2024, 11:49 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-24-2024, 08:47 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X