Header Leaderboard Ad

Collapse

bwa samse segmentation fault

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa samse segmentation fault

    Hi, there,

    I'm trying to use bwa to align SOLID reads to human genome. The alignment step runs fine using "bwa aln -c" after converting color reads/quality files to fastq format and indexing human genomes. However, bwa samse failed and generated segmentation fault.

    [bwa_aln_core] convert to sequence coordinate... 4.27 sec
    [bwa_aln_core] refine gapped alignments... Segmentation fault

    If I use only the first 30 reads to do the alignment, sam file can be generated without error, although no reads are mapped to the genome. The sam output is
    NB1001:1279_6_16 4 * 0 0 * * 0
    0 ANGGGCNATGANGGTNNCGGANGTTGNAGCGNTGGGNGGGGNNGGGGNG ,-":#49-"2,0%-"8
    %8-"-"8'5$-":.4(-"+5''-"<(*+-"%)5
    NB1001:1279_6_26 4 * 0 0 * * 0
    0 GNACACNGGAGNTCGNNTTTANATCGNGGGGNAGAGNGGAGNNGAGGNG 7-"7+84-"95/3-")
    )0-"-"/&+&-"(+36-"#555-"&*(#-"+%*
    ..........

    However, the error appears if I use next 10 reads for the alignment. It seems that the sequence conversion doesn't work for the mapped reads.

    Can anyone help me with this problem? thanks a lot.

    Xiang

    SAIC-Frederick, Inc.
    National Cancer Institute
    Gaithersburg, MD.

  • #2
    I am also experiencing this issue- bwa samse generates a segmentation fault for the genome the size of human reference and about 30 million reads

    any help would be appreciated. thanks

    Comment


    • #3
      I am having the same error, as early as converting the human genome to a fasta file format with the command fasta2bfa.

      Comment


      • #4
        Originally posted by luisczul View Post
        I am having the same error, as early as converting the human genome to a fasta file format with the command fasta2bfa.
        Are you running out of RAM?

        Comment


        • #5
          The error I got is not related to memory, since I have even tried it in a machine with 512 GB memory. I suspect that the conversion from SOLID csfasta/quality format to fastq format may have problem. Using bwa samse -n 2 ..., I can get a simplified alignment output. There are some weird records such as:

          >-"8$ 2 1865904808
          chr10 -90253347 0
          chr10 -50629021 0

          It seems that part of the quality value is mistaken as a new read record and it was aligned to the genome millions of times. Most of the other reads look fine with the output like:

          >test:1279_470_1023 1 1
          chr22 +42910109 0
          >test:1279_470_1108 1 1
          chr18 -43820923 0
          >test:1279_470_1122 0 0

          Segmentation error occurs if I use bwa samse -n -1 to disable outputting multiple hits.

          Any help is greatly appreciated.

          Xiang

          Comment


          • #6
            Try PMing Heng Li (lh3) who is the author of bwa. If you are in a bind, there are other SOLiD aligners (like my own BFAST), etc.

            Comment


            • #7
              missing value in phred Ascii representation

              It seems that solid2fastq.pl script doesn't handle missing quality value. It generates -" for phred score -1. Does anyone know how to transform score -1 to ASCII?

              thanks
              Xiang

              Comment


              • #8
                Originally posted by xguo View Post
                It seems that solid2fastq.pl script doesn't handle missing quality value. It generates -" for phred score -1. Does anyone know how to transform score -1 to ASCII?

                thanks
                Xiang
                Are missing quality values listed as blanks for you? I will update the code accordingly. If you have a blank quality score, you could always give it a phred score of 1 stating not to trust the color call, or you could give it a maximum value 255 stating that you should trust the uncalled color. Tailor it to your situation. Feel free to PM me to get your issues resolved.

                Comment


                • #9
                  The missing quality is encoded as -1 in QV file generated by SOLID platform. The solid2fastq.pl script treated it as two values, so the resulting fastq has uneven length for the read and quality field. I changed -1 to 0, and everything is fine now.

                  thanks
                  Xiang

                  Comment


                  • #10
                    Originally posted by xguo View Post
                    The missing quality is encoded as -1 in QV file generated by SOLID platform. The solid2fastq.pl script treated it as two values, so the resulting fastq has uneven length for the read and quality field. I changed -1 to 0, and everything is fine now.

                    thanks
                    Xiang
                    I have changed this in BFAST's solid2fastq.pl script (which now is implemented in C for efficiency). I will release this script in an upcoming update but let me know if you want it earlier.

                    Comment


                    • #11
                      I would like to add that I am observing the same problem of having bwa "samse" analysis seg fault. The dataset is human illumina reads (~500 million). BWA converted about 220 million reads before the seg fault. The machine I am running this has 32GB of RAM. The process was using only about 2.3 GB.

                      -- hk

                      Comment


                      • #12
                        I like to build color-space indexing by bwa. The input fast should be in nucleotide space, so I use following command to index whole human genome:

                        >bwa index -c human.fasta

                        But segmentation fault occurred everytime like this,

                        [bwa_index] Pack nucleotide FASTA... 60.48 sec
                        [bwa_index] Convert nucleotide PAC to color PAC... 31.13 sec
                        [bwa_index] Reverse the packed sequence... 16.62 sec
                        [bwa_index] Construct BWT for the packed sequence...
                        Segmentation fault

                        Can anyone tell me why that happen?

                        thanks
                        totalnew is offline Reply With Quote

                        Comment


                        • #13
                          bwa sampe ../../genome/genome.fa aln_sa1.sai aln_sa2.sai 4_1.fq 4_2.fq > pairs.sam

                          also a [1]+ Segmentation fault

                          Comment


                          • #14
                            Same problem here.

                            Tried the above mentioned method, change -1 to 0 in qual file. Now seq and qual have the same length in fastq file. But still the same segmentation fault problem. Same symptom as above. Use "bwa samse -n 2" can get output, and see some strange read names which are actually part of quality strings.

                            Could anyone help fix that?

                            Comment


                            • #15
                              Probable solution for segfault

                              I had the same problem when converting the alignment files to SAM format. I have a solution that works for me.

                              I used version 0.5.1 from BWA.

                              I'm not convinced that changing the quality value from -1 to 0 helps because the quality values are log values. And zero is not a log value. So I change every -1 and 0 in the quality files to 1.

                              I have written my own fastq transformation script in C and I tested it, no segmentation faults with 'bwa samse'.
                              However when I used the perl script on the same data I got segmentation faults.

                              The C script can create multiple smaller fastq files, because we align on a large cluster.

                              And the C script is 10 to 20 times faster than the perl script.

                              csfastaToFastq.tar.gz

                              Just run 'make' in the extracted folder.
                              Last edited by fpruzius; 12-20-2009, 09:05 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                                by seqadmin


                                ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                                01-24-2023, 01:19 PM
                              • seqadmin
                                Introduction to Single-Cell Sequencing
                                by seqadmin
                                Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                                The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                                ...
                                01-09-2023, 03:10 PM
                              • seqadmin
                                AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
                                by seqadmin
                                Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

                                Read type and length
                                AVITI is a short-read benchtop sequencer that also offers an innovative...
                                12-29-2022, 10:44 AM

                              ad_right_rmr

                              Collapse
                              Working...
                              X