Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looking for best strategy to realign BAM files

    Hi all, I was given a set of BAM files (100 Gb each) that I would like to realign using bwa. Problem is, I would like to have the best speed for doing this.
    At the moment start from chunked fastq files and send a single bwa alignment for each node of my cluster, I achieve good parallelism and speed.
    When starting from BAM files I have two options:
    1- convert BAM to fastq -> split fastq in chunks -> align in the same way
    2- feed bwa with BAM files
    If I go for (2) I cannot really parallelize the whole process, unless I can split bam files into chunks which must contain both pairs for each fragment. The only way to do this, I guess, is to sort by read name my BAM files and then split. I don't have an idea about the time required and the space for the newly sorted file
    If I go for (1) I can use picard SamToFastq but it takes ~25s every 100k reads to convert... each of my BAM files contains 130M reads, it would take more than a week only to convert.

    Does anybody want to spend two cents on this with an advice?
    thanks

    d

  • #2
    Is your data PE or SE?

    In fact you can split BAM directly, no matter compressed or not, it was using bgzip format. I believe there are tools for this.

    I would prefer feed BWA BAM files and having multiple BWA instances.

    my two pence.

    dong

    Comment


    • #3
      Yikes, any way you do this will require you to reorder things by read for bwa to run efficiently (I was unaware that one could even directly feed bwa a BAM file). Honestly, I suspect you'd be best off using one of the parallel versions of samtools to sort the monster BAM file by read name. I hope that there's only one alignment reported for each read, otherwise you have to take that into account if you then make a fastq file (maybe bwa knows how to deal with that in a BAM file).

      If you're familiar with programming, you could split the BAM file (don't forget to put a header on each of the split files!) and then sort them on different cluster nodes (not being familiar with how your cluster is made, it may be simpler to just use one of the parallel versions of samtools instead) until eventually merging them.

      Comment


      • #4
        Originally posted by xied75 View Post
        Is your data PE or SE?

        In fact you can split BAM directly, no matter compressed or not, it was using bgzip format. I believe there are tools for this.

        I would prefer feed BWA BAM files and having multiple BWA instances.

        my two pence.

        dong
        These are PE data. I've started a bwa realignment directly from BAM, using bwa threads in alignment. sampe step will take ages, though...
        thanks

        d

        Comment


        • #5
          Originally posted by dawe View Post
          These are PE data. I've started a bwa realignment directly from BAM, using bwa threads in alignment. sampe step will take ages, though...
          thanks

          d
          Yes because sampe is single thread no matter how powerful your machine is and one instance will eat >6GB memory so making run multiple instances also difficult unless you have 128GB something.

          Turn on -P will eat more memory, but should run faster.

          My Windows bwa can do multithread sampe, I suggested people could use my way to modify the Linux version, seems nobody interested to take on this job. If I could have some free time I might do it myself.

          Best,

          dong

          Comment

          Latest Articles

          Collapse

          • seqadmin
            New Genomics Tools and Methods Shared at AGBT 2025
            by seqadmin


            This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

            The Headliner
            The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
            03-03-2025, 01:39 PM
          • seqadmin
            Investigating the Gut Microbiome Through Diet and Spatial Biology
            by seqadmin




            The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
            02-24-2025, 06:31 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 12:50 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-03-2025, 01:15 PM
          0 responses
          182 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-28-2025, 12:58 PM
          0 responses
          278 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-24-2025, 02:48 PM
          0 responses
          664 views
          0 likes
          Last Post seqadmin  
          Working...
          X