Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • rwhet052
    replied
    Splitting the genome into 6 files, each < 3 Gb, works fine, with the caveat that the "unique" flags for each mapped read apply only within the subset index file, so some additional merging and consolidation of results is required.

    Unfortunately the samtools merge function is not intended for this sort of problem, because it assumes the same reference sequences are used to map different sets of reads, instead of different reference sequences being used to map the same set of reads.

    Leave a comment:


  • GenoMax
    replied
    Originally posted by rwhet052 View Post
    I used 'top' to monitor memory usage on the cloud instance where I built the index, and it showed > 50Gb of memory in use. My impression is that the bwtsw indexing algorithm requires at least as much memory as the size of the genome to be indexed, and I don't have that on my local machine.
    I was hopeful that using the same version of bwa and the same OS would overcome any platform-specific issues. Is the program is sensitive to hardware configuration?
    At least that was my experience in the past.

    I suppose you could try to build indexes by splitting your genome into parts.

    Leave a comment:


  • rwhet052
    replied
    I used 'top' to monitor memory usage on the cloud instance where I built the index, and it showed > 50Gb of memory in use. My impression is that the bwtsw indexing algorithm requires at least as much memory as the size of the genome to be indexed, and I don't have that on my local machine.
    I was hopeful that using the same version of bwa and the same OS would overcome any platform-specific issues. Is the program is sensitive to hardware configuration?

    Leave a comment:


  • GenoMax
    replied
    In the past when I had a problem with seg faults with bwa I ended up rebuilding the index on the machine where I was running the bwa.

    Have you tried that or is 16GB on your local box not enough to re-build the indexes?

    Leave a comment:


  • rwhet052
    started a topic bwa 0.6.1-r104 segfault problem

    bwa 0.6.1-r104 segfault problem

    Hi all,
    I am working with a challenging genome (22 Gb haploid DNA content per nucleus, 18 Gb first-draft assembly in 31 million contigs), and using BWA 0.6.1 because it is the only short-read aligner I have found that can index a genome > 4 Gb. I used an AWS EC2 Cloudbiolinux instance (June 2012 release) with 68 Gb RAM to build the index, then saved it to an S3 bucket and terminated the instance. The version of BWA I used is based on what is available as a Ubuntu package with apt-get install.
    I downloaded the index (as a .tgz archive) to my local Ubuntu 12.04 box (16 Gb RAM), unpacked it, and tried to align a fasta file of sample sequences (158,000 sequences), but bwa crashes after the line
    [bwa_aln] 225bp reads: max_diff = 9 with the error
    Segmentation fault (core dumped)

    My local box is running the same version of BWA on essentially the same OS, but presumably different hardware from the AWS EC2 instance. It does not seem to be a problem with the .tgz archive, because I can download that to another Cloudbiolinux instance, unpack it, and map the same set of sample sequences to the index without problems.

    Any suggestions for how to solve this would be greatly appreciated.

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
29 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
31 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X