Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa 0.6.1-r104 segfault problem

    Hi all,
    I am working with a challenging genome (22 Gb haploid DNA content per nucleus, 18 Gb first-draft assembly in 31 million contigs), and using BWA 0.6.1 because it is the only short-read aligner I have found that can index a genome > 4 Gb. I used an AWS EC2 Cloudbiolinux instance (June 2012 release) with 68 Gb RAM to build the index, then saved it to an S3 bucket and terminated the instance. The version of BWA I used is based on what is available as a Ubuntu package with apt-get install.
    I downloaded the index (as a .tgz archive) to my local Ubuntu 12.04 box (16 Gb RAM), unpacked it, and tried to align a fasta file of sample sequences (158,000 sequences), but bwa crashes after the line
    [bwa_aln] 225bp reads: max_diff = 9 with the error
    Segmentation fault (core dumped)

    My local box is running the same version of BWA on essentially the same OS, but presumably different hardware from the AWS EC2 instance. It does not seem to be a problem with the .tgz archive, because I can download that to another Cloudbiolinux instance, unpack it, and map the same set of sample sequences to the index without problems.

    Any suggestions for how to solve this would be greatly appreciated.

  • #2
    In the past when I had a problem with seg faults with bwa I ended up rebuilding the index on the machine where I was running the bwa.

    Have you tried that or is 16GB on your local box not enough to re-build the indexes?

    Comment


    • #3
      I used 'top' to monitor memory usage on the cloud instance where I built the index, and it showed > 50Gb of memory in use. My impression is that the bwtsw indexing algorithm requires at least as much memory as the size of the genome to be indexed, and I don't have that on my local machine.
      I was hopeful that using the same version of bwa and the same OS would overcome any platform-specific issues. Is the program is sensitive to hardware configuration?

      Comment


      • #4
        Originally posted by rwhet052 View Post
        I used 'top' to monitor memory usage on the cloud instance where I built the index, and it showed > 50Gb of memory in use. My impression is that the bwtsw indexing algorithm requires at least as much memory as the size of the genome to be indexed, and I don't have that on my local machine.
        I was hopeful that using the same version of bwa and the same OS would overcome any platform-specific issues. Is the program is sensitive to hardware configuration?
        At least that was my experience in the past.

        I suppose you could try to build indexes by splitting your genome into parts.

        Comment


        • #5
          Splitting the genome into 6 files, each < 3 Gb, works fine, with the caveat that the "unique" flags for each mapped read apply only within the subset index file, so some additional merging and consolidation of results is required.

          Unfortunately the samtools merge function is not intended for this sort of problem, because it assumes the same reference sequences are used to map different sets of reads, instead of different reference sequences being used to map the same set of reads.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-27-2024, 06:37 PM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-27-2024, 06:07 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Working...
          X