Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa samse segmentation fault

    Hi, there,

    I'm trying to use bwa to align SOLID reads to human genome. The alignment step runs fine using "bwa aln -c" after converting color reads/quality files to fastq format and indexing human genomes. However, bwa samse failed and generated segmentation fault.

    [bwa_aln_core] convert to sequence coordinate... 4.27 sec
    [bwa_aln_core] refine gapped alignments... Segmentation fault

    If I use only the first 30 reads to do the alignment, sam file can be generated without error, although no reads are mapped to the genome. The sam output is
    NB1001:1279_6_16 4 * 0 0 * * 0
    0 ANGGGCNATGANGGTNNCGGANGTTGNAGCGNTGGGNGGGGNNGGGGNG ,-":#49-"2,0%-"8
    %8-"-"8'5$-":.4(-"+5''-"<(*+-"%)5
    NB1001:1279_6_26 4 * 0 0 * * 0
    0 GNACACNGGAGNTCGNNTTTANATCGNGGGGNAGAGNGGAGNNGAGGNG 7-"7+84-"95/3-")
    )0-"-"/&+&-"(+36-"#555-"&*(#-"+%*
    ..........

    However, the error appears if I use next 10 reads for the alignment. It seems that the sequence conversion doesn't work for the mapped reads.

    Can anyone help me with this problem? thanks a lot.

    Xiang

    SAIC-Frederick, Inc.
    National Cancer Institute
    Gaithersburg, MD.

  • #2
    I am also experiencing this issue- bwa samse generates a segmentation fault for the genome the size of human reference and about 30 million reads

    any help would be appreciated. thanks

    Comment


    • #3
      I am having the same error, as early as converting the human genome to a fasta file format with the command fasta2bfa.

      Comment


      • #4
        Originally posted by luisczul View Post
        I am having the same error, as early as converting the human genome to a fasta file format with the command fasta2bfa.
        Are you running out of RAM?

        Comment


        • #5
          The error I got is not related to memory, since I have even tried it in a machine with 512 GB memory. I suspect that the conversion from SOLID csfasta/quality format to fastq format may have problem. Using bwa samse -n 2 ..., I can get a simplified alignment output. There are some weird records such as:

          >-"8$ 2 1865904808
          chr10 -90253347 0
          chr10 -50629021 0

          It seems that part of the quality value is mistaken as a new read record and it was aligned to the genome millions of times. Most of the other reads look fine with the output like:

          >test:1279_470_1023 1 1
          chr22 +42910109 0
          >test:1279_470_1108 1 1
          chr18 -43820923 0
          >test:1279_470_1122 0 0

          Segmentation error occurs if I use bwa samse -n -1 to disable outputting multiple hits.

          Any help is greatly appreciated.

          Xiang

          Comment


          • #6
            Try PMing Heng Li (lh3) who is the author of bwa. If you are in a bind, there are other SOLiD aligners (like my own BFAST), etc.

            Comment


            • #7
              missing value in phred Ascii representation

              It seems that solid2fastq.pl script doesn't handle missing quality value. It generates -" for phred score -1. Does anyone know how to transform score -1 to ASCII?

              thanks
              Xiang

              Comment


              • #8
                Originally posted by xguo View Post
                It seems that solid2fastq.pl script doesn't handle missing quality value. It generates -" for phred score -1. Does anyone know how to transform score -1 to ASCII?

                thanks
                Xiang
                Are missing quality values listed as blanks for you? I will update the code accordingly. If you have a blank quality score, you could always give it a phred score of 1 stating not to trust the color call, or you could give it a maximum value 255 stating that you should trust the uncalled color. Tailor it to your situation. Feel free to PM me to get your issues resolved.

                Comment


                • #9
                  The missing quality is encoded as -1 in QV file generated by SOLID platform. The solid2fastq.pl script treated it as two values, so the resulting fastq has uneven length for the read and quality field. I changed -1 to 0, and everything is fine now.

                  thanks
                  Xiang

                  Comment


                  • #10
                    Originally posted by xguo View Post
                    The missing quality is encoded as -1 in QV file generated by SOLID platform. The solid2fastq.pl script treated it as two values, so the resulting fastq has uneven length for the read and quality field. I changed -1 to 0, and everything is fine now.

                    thanks
                    Xiang
                    I have changed this in BFAST's solid2fastq.pl script (which now is implemented in C for efficiency). I will release this script in an upcoming update but let me know if you want it earlier.

                    Comment


                    • #11
                      I would like to add that I am observing the same problem of having bwa "samse" analysis seg fault. The dataset is human illumina reads (~500 million). BWA converted about 220 million reads before the seg fault. The machine I am running this has 32GB of RAM. The process was using only about 2.3 GB.

                      -- hk

                      Comment


                      • #12
                        I like to build color-space indexing by bwa. The input fast should be in nucleotide space, so I use following command to index whole human genome:

                        >bwa index -c human.fasta

                        But segmentation fault occurred everytime like this,

                        [bwa_index] Pack nucleotide FASTA... 60.48 sec
                        [bwa_index] Convert nucleotide PAC to color PAC... 31.13 sec
                        [bwa_index] Reverse the packed sequence... 16.62 sec
                        [bwa_index] Construct BWT for the packed sequence...
                        Segmentation fault

                        Can anyone tell me why that happen?

                        thanks
                        totalnew is offline Reply With Quote

                        Comment


                        • #13
                          bwa sampe ../../genome/genome.fa aln_sa1.sai aln_sa2.sai 4_1.fq 4_2.fq > pairs.sam

                          also a [1]+ Segmentation fault

                          Comment


                          • #14
                            Same problem here.

                            Tried the above mentioned method, change -1 to 0 in qual file. Now seq and qual have the same length in fastq file. But still the same segmentation fault problem. Same symptom as above. Use "bwa samse -n 2" can get output, and see some strange read names which are actually part of quality strings.

                            Could anyone help fix that?

                            Comment


                            • #15
                              Probable solution for segfault

                              I had the same problem when converting the alignment files to SAM format. I have a solution that works for me.

                              I used version 0.5.1 from BWA.

                              I'm not convinced that changing the quality value from -1 to 0 helps because the quality values are log values. And zero is not a log value. So I change every -1 and 0 in the quality files to 1.

                              I have written my own fastq transformation script in C and I tested it, no segmentation faults with 'bwa samse'.
                              However when I used the perl script on the same data I got segmentation faults.

                              The C script can create multiple smaller fastq files, because we align on a large cluster.

                              And the C script is 10 to 20 times faster than the perl script.

                              csfastaToFastq.tar.gz

                              Just run 'make' in the extracted folder.
                              Last edited by fpruzius; 12-20-2009, 09:05 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 12:08 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              14 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              14 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              43 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X