Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

bwa 0.6.1-r104 segfault problem

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa 0.6.1-r104 segfault problem

    Hi all,
    I am working with a challenging genome (22 Gb haploid DNA content per nucleus, 18 Gb first-draft assembly in 31 million contigs), and using BWA 0.6.1 because it is the only short-read aligner I have found that can index a genome > 4 Gb. I used an AWS EC2 Cloudbiolinux instance (June 2012 release) with 68 Gb RAM to build the index, then saved it to an S3 bucket and terminated the instance. The version of BWA I used is based on what is available as a Ubuntu package with apt-get install.
    I downloaded the index (as a .tgz archive) to my local Ubuntu 12.04 box (16 Gb RAM), unpacked it, and tried to align a fasta file of sample sequences (158,000 sequences), but bwa crashes after the line
    [bwa_aln] 225bp reads: max_diff = 9 with the error
    Segmentation fault (core dumped)

    My local box is running the same version of BWA on essentially the same OS, but presumably different hardware from the AWS EC2 instance. It does not seem to be a problem with the .tgz archive, because I can download that to another Cloudbiolinux instance, unpack it, and map the same set of sample sequences to the index without problems.

    Any suggestions for how to solve this would be greatly appreciated.

  • #2
    In the past when I had a problem with seg faults with bwa I ended up rebuilding the index on the machine where I was running the bwa.

    Have you tried that or is 16GB on your local box not enough to re-build the indexes?

    Comment


    • #3
      I used 'top' to monitor memory usage on the cloud instance where I built the index, and it showed > 50Gb of memory in use. My impression is that the bwtsw indexing algorithm requires at least as much memory as the size of the genome to be indexed, and I don't have that on my local machine.
      I was hopeful that using the same version of bwa and the same OS would overcome any platform-specific issues. Is the program is sensitive to hardware configuration?

      Comment


      • #4
        Originally posted by rwhet052 View Post
        I used 'top' to monitor memory usage on the cloud instance where I built the index, and it showed > 50Gb of memory in use. My impression is that the bwtsw indexing algorithm requires at least as much memory as the size of the genome to be indexed, and I don't have that on my local machine.
        I was hopeful that using the same version of bwa and the same OS would overcome any platform-specific issues. Is the program is sensitive to hardware configuration?
        At least that was my experience in the past.

        I suppose you could try to build indexes by splitting your genome into parts.

        Comment


        • #5
          Splitting the genome into 6 files, each < 3 Gb, works fine, with the caveat that the "unique" flags for each mapped read apply only within the subset index file, so some additional merging and consolidation of results is required.

          Unfortunately the samtools merge function is not intended for this sort of problem, because it assumes the same reference sequences are used to map different sets of reads, instead of different reference sequences being used to map the same set of reads.

          Comment

          Working...
          X