Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dpmccul
    Junior Member
    • Nov 2009
    • 2

    Using BWA

    I am having problems with bwa, and would like some direction. I am running on some
    454 reads for Salmonella.

    I have converted the reads to fastq format.
    I index with the command:
    bwa index -p sal_chr sal.fa

    I then align with the command:
    bwa aln -t 4 sal_chr 454Reads.MID1.sff.fq sal.sam
    and get:
    [bwa_aln] 17bp reads: max_diff = 2
    [bwa_aln] 38bp reads: max_diff = 3
    [bwa_aln] 64bp reads: max_diff = 4
    [bwa_aln] 93bp reads: max_diff = 5
    [bwa_aln] 124bp reads: max_diff = 6
    [bwa_aln] 157bp reads: max_diff = 7
    [bwa_aln] 190bp reads: max_diff = 8
    [bwa_aln] 225bp reads: max_diff = 9
    [bwa_aln_core] calculate SA coordinate... Segmentation fault

    Ideas on what I am doing wrong?
    I am requesting 10GB memory, is this insufficient?
    Thank you.
  • lh3
    Senior Member
    • Feb 2008
    • 686

    #2
    Someone has reported that bwa-short segfaults given 454 reads. I have not looked into the details. Nonetheless, if you read the documentation or the FAQ, you will know that bwa-{aln,sampe,samse} is not designed for 454 reads at all. You should use "bwasw" instead. For sequencing reads, bwa typically uses less than 4GB memory.

    Comment

    • mingkunli
      Member
      • Jan 2009
      • 42

      #3
      btw, how did you covert 454 reads to fastq format?

      Comment

      • dpmccul
        Junior Member
        • Nov 2009
        • 2

        #4
        I used sfffile to extract the data of interest to a .sff file, and then sffinfo to go to a *.fa and
        *.qual file. I then wrote a c program to convert these to fastq (I know c better than perl),
        and then verified with validateFastq Version 1 by Tyler Backman.

        The c code for conversion from fasta to fastq is:
        -----------------------------------------------------------------------------------
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>
        #include <strings.h>
        #include <ctype.h>

        #define MAX_STRING_SIZE 200

        /**********************************************************************
        *
        * Program to convert fasta (with qual) files to fastq format.
        *
        * fasta2fastq takes the path to the fasta fil4e and associated qual
        * file, and writes to the fastq file the converted file format
        * information.
        *
        * Dean P McCullough
        * Nov 8, 2009
        *
        *********************************************************************/

        int main(int argc, char *argv[] )
        {
        FILE *fasta, *qual, *fastq;
        int i,jj;
        char k;
        char *aline, *qline;
        int nreads,imin,imax;


        /******************
        * USAGE ********
        ******************/
        if( argc < 4 )
        {
        fprintf(stderr,"\n\n");
        fprintf(stderr,"usage: %s dna_fasta_file dna_qual_file fastq_file\n", argv[0]);
        fprintf(stderr,"Convert the fasta and qual file to fastq format.\n");
        exit(1);
        }

        fasta = fopen(argv[1],"r");

        fprintf(stderr,"Fasta file = %s \n",argv[1]);

        if(fasta == NULL)
        {
        fprintf(stderr,"fopen failed for fasta file \n");
        exit(1);
        }

        qual = fopen(argv[2],"r");

        fprintf(stderr,"Qual file = %s \n",argv[2]);

        if(qual == NULL)
        {
        fprintf(stderr,"fopen failed for qual file \n");
        exit(1);
        }

        fastq = fopen(argv[3],"w");

        fprintf(stderr,"Fastq output file = %s \n",argv[3]);

        if(fastq == NULL)
        {
        fprintf(stderr,"fopen failed for fastq output file \n");
        exit(1);
        }


        /* line for fgets reads */
        aline=(char *)malloc(MAX_STRING_SIZE*sizeof(char));
        if( aline == NULL )
        {
        fprintf(stderr,"malloc failed for aline\n");
        exit(1);
        }

        qline=(char *)malloc(MAX_STRING_SIZE*sizeof(char));
        if( qline == NULL )
        {
        fprintf(stderr,"malloc failed for qline\n");
        exit(1);
        }

        memset((char *)aline,'\0',MAX_STRING_SIZE*sizeof(char) );
        memset((char *)qline,'\0',MAX_STRING_SIZE*sizeof(char) );

        nreads=0;
        imin=1000;
        imax = -1;


        fgets( (char *)aline,MAX_STRING_SIZE,fasta);
        fgets( (char *)qline,MAX_STRING_SIZE,qual);
        while(strlen(aline) != 0)
        {
        if(strncmp(&aline[0],">",1)==0)
        {
        nreads++;

        fprintf(fastq,"@%s",aline+1);

        if(strcmp(aline,qline)!=0)
        {
        fprintf(stderr,"Read labels do not agree.\n%s\n%s\n",
        aline,qline);
        exit(1);
        }

        for(;
        {
        memset((char *)aline,'\0',MAX_STRING_SIZE*sizeof(char) );
        fgets( (char *)aline,MAX_STRING_SIZE,fasta);
        if(strncmp(&aline[0],">",1)==0) break;
        if(strlen(aline) == 0) break;
        fputs(aline,fastq);
        }
        fprintf(fastq,"+\n");


        for(;
        {
        memset((char *)qline,'\0',MAX_STRING_SIZE*sizeof(char) );
        fgets( (char *)qline,MAX_STRING_SIZE,qual);
        if(strncmp(&qline[0],">",1)==0) break;
        if(strlen(qline) == 0) break;
        i=0;
        while(qline[i]!='\0')
        {
        jj=atoi(&qline[i]);
        k = jj+33;
        fprintf(fastq,"%c",k);
        if(imin > jj) imin=jj;
        if(imax < jj) imax=jj;
        i++;

        while(isdigit(qline[i]))i++;
        while(isspace(qline[i]))i++;
        }
        fprintf(fastq,"\n");
        }
        }
        else
        {
        fprintf(stderr,"Something wrong \n%s\n%s\n",
        aline,qline);
        exit(1);
        }
        }

        fprintf(stderr,"%d reads min = %d max =%d\n",nreads,imin,imax);
        }

        Comment

        • jnfass
          Member
          • Aug 2008
          • 88

          #5
          I don't want to get into a language debate, but here's the same thing in perl:

          Code:
          #!/usr/bin/perl
          
          # AUTHOR: Joseph Fass
          # LAST REVISED: August 2009
          # 
          # The Bioinformatics Core at UC Davis Genome Center
          # http://bioinformatics.ucdavis.edu
          # Copyright (c) 2008 The Regents of University of California, Davis Campus.
          # All rights reserved.
          
          use strict;
          
          my $usage = "\nusage: $0 <reads (fasta format)> <qualities (fasta format)>\n\n".
                      "-- All sequences and qualities must be on a single line each. --.\n".
                      "-- All quality values MUST be followed by a space character. --\n\n".
                      "Merges fasta and qual files into a single Sanger fastq file (STDOUT).";
          
          my $seqfile = shift or die $usage;
          my $qualfile = shift or die $usage;
          
          open F, "<$seqfile" or die $usage;
          open Q, "<$qualfile" or die $usage;
          
          my ($fh, $fs, $qh, $qs);
          my @quals;  my $qv;
          
          while (<F>) {
          	$fh = $_;
          	$fs = <F>;
          	$qh = <Q>;
          	$qs = <Q>;
          	if ($fh eq $qh) {
          		$fh =~ s/>/@/;
          		print $fh.$fs;
          		$qh =~ s/>/+/;
          		print $qh;
          		@quals = split(/\s+/,$qs);
          		for (my $i=0; $i<=$#quals; $i++) {
          			$qv = $quals[$i] + 33;
          			print chr($qv);
          		}
          		print "\n";
          	}
          	else { print "PROBLEM\n" }
          }
          
          close F; close Q;
          A note on the above ... I tend to write all my code depending on having each sequence on one line ... even when I'm working with huge scaffolds or chromosomes ... can be hard to read, but generally doesn't cause problems unless the sequences are really huge ... and it makes coding a lot easier. So here's some perl for converting to the 'oneline' format:

          Code:
          #!/usr/bin/perl
          
          # AUTHOR: Joseph Fass
          # LAST REVISED: October 2009
          # 
          # The Bioinformatics Core at UC Davis Genome Center
          # http://bioinformatics.ucdavis.edu
          # Copyright (c) 2009 The Regents of University of California, Davis Campus.
          # All rights reserved.
          
          use strict;
          
          my $usage = "\nusage: cat <input fasta file> | $0 > <output fasta file>\n\n".
                      "For each sequence, puts nt/aa all on one line following header line.\n".
                      "Reads from and writes to STDOUT, so use redirection ... \n\n\n";
          
          my $block=""; my $line;
          
          while (<>) {
          	$line = $_;
          	if (m/^>/) {  # header line
          		print $block;  # which is either empty, or capped by a newline
          		$block = $line."\n";  # adds another newline, because it will be chomped below
          	}
          	else {
          		chomp $line;
          		if (substr($line,-1) =~ m/[0-9]/) {
          			chomp $block;  # remove newline before adding more sequence
          			$block .= $line." \n";  # must be fasta qualities, and needs a space
          		}
          		else {
          			chomp $block;  # remove newline before adding more sequence
          			$block .= $line."\n";  # not fasta qualities, so no space needed
          		}
          	}
          }
          print $block;  # print last sequence block

          Comment

          • Francesco Lescai
            Junior Member
            • Mar 2010
            • 5

            #6
            I have the same problem.
            I got a segmentation error in indexing the hg19.fasta sequence of the human genome.
            then I changed machine (actually to one with much less memory) and it worked.. magic?
            then I tried to bwa align my reads and I get this typical pattern, independently of the computer, and indipendently of the number of reads I'm trying to align
            [bwa_aln] 17bp reads: max_diff = 2
            [bwa_aln] 38bp reads: max_diff = 3
            [bwa_aln] 64bp reads: max_diff = 4
            [bwa_aln] 93bp reads: max_diff = 5
            [bwa_aln] 124bp reads: max_diff = 6
            [bwa_aln] 157bp reads: max_diff = 7
            [bwa_aln] 190bp reads: max_diff = 8
            [bwa_aln] 225bp reads: max_diff = 9
            [bwa_aln_core] calculate SA coordinate... Segmentation fault

            If I align the same reads over the chromosome X only, it seems to work but then I get reads without coordinates (as if they were not mapped).
            Of course I know they map very well because I aligned them with MAQ and I got 90% of the reads mapping.

            what about the segmentation fault?

            I have the same problem.
            I got a segmentation error in indexing the hg19.fasta sequence of the human genome.
            then I changed machine (actually to one with much less memory) and it worked.. magic?
            then I tried to bwa align my reads and I get this typical pattern, independently of the computer, and indipendently of the number of reads I'm trying to align
            [bwa_aln] 17bp reads: max_diff = 2
            [bwa_aln] 38bp reads: max_diff = 3
            [bwa_aln] 64bp reads: max_diff = 4
            [bwa_aln] 93bp reads: max_diff = 5
            [bwa_aln] 124bp reads: max_diff = 6
            [bwa_aln] 157bp reads: max_diff = 7
            [bwa_aln] 190bp reads: max_diff = 8
            [bwa_aln] 225bp reads: max_diff = 9
            [bwa_aln_core] calculate SA coordinate... Segmentation fault

            If I align the same reads over the chromosome X only, it seems to work but then I get reads without coordinates (as if they were not mapped).
            Of course I know they map very well because I aligned them with MAQ and I got 90% of the reads mapping.

            what about the segmentation fault?
            Last edited by nilshomer; 04-15-2010, 10:05 AM.

            Comment

            • Lien
              Member
              • Dec 2009
              • 47

              #7
              I also have the same segmentation fault when trying to align Illumina fastq reads to human genome.

              Anyone has an idea to solve this problem?

              Comment

              • Lien
                Member
                • Dec 2009
                • 47

                #8
                I used the latest version of BWA 0.5.7 and got the segmentation fault when aligning Illumina reads. Even when I aligned only 10 reads, this segmentation fault appeared.
                Then I changed to BWA 0.5.6 and everything works just as it should!

                Some other question:
                I aligned my Illumina data both with Bowtie and with BWA. With Bowtie, I get a nice overview of the number of reads that were processed, the number of reads with at least 1 reported alignment, ...
                Is it possible to get similar results for BWA?

                Thanks,
                Lien

                Comment

                • maubp
                  Peter (Biopython etc)
                  • Jul 2009
                  • 1544

                  #9
                  Originally posted by dpmccul View Post
                  I used sfffile to extract the data of interest to a .sff file, and then sffinfo to go to a *.fa and
                  *.qual file. I then wrote a c program to convert these to fastq (I know c better than perl),
                  There are much easier ways to do SFF to FASTQ, for example sff_extract:


                  As long as you are not working with paired end 454, you can also do this with Biopython 1.54b or later in just two lines of code (import the Bio.SeqIO module, call the Bio.SeqIO.convert function).

                  Comment

                  • lh3
                    Senior Member
                    • Feb 2008
                    • 686

                    #10
                    @Lien

                    Please use the version of bwa in SVN. You can get the statistics from samtools' flagstat. Or a simple awk command will do.

                    Comment

                    • prisnirath
                      Member
                      • May 2011
                      • 19

                      #11
                      BWA for Roche454 data.

                      Hi there..
                      I am using BWA to perform sequence alignment for the reads generated from Roche454 data.
                      I used sfffile and sffinfo to MID sort and trim off mids from SFF files.
                      So, I have got a read file in fastq format, and my reference sequences in fasta format.
                      I used BWa to index my reference fasta file.
                      Then, aligning the read fastq file against my reference fasta file.
                      Now, I am getting a SAM format file with data in this format::
                      GGDP4G001AT8TK 0 gi|237757283|ref|NM_007294.3| 1331 197 57M1I50M1I13M1I5M1I4M1I15M1I9M1I23M1I12M1I8M1I5M1I9M1I24M1I16M25S * 0 0 ACTGAAGATGTTCCTTGGATAACACTAAATAGCAGCATTCAGAAAGTTAATGAGTGGTTTTTCCAGAAGTGATGAACTGTTAGGTTCTGATGACTCACATGATGGGGAGGTCTGAATCAAATAGCCAACGGTAGGCTGATGTATTGGACGGTTCTAAATAGNGGTAGATGAATATTCTGGTTCGTTCAGAGAAAATAAGACTTACTTGGCCGAGTGATCCTCCATGAGGCTTTATTATGTAAAAGTAGAAGAAGTTCACTCCAGTCATCGCTGCGCCCCTCGACGCAC IIIIIIIIIIIIIIIDHHHIGGHHHFFFDDDDFFFFFDDDFFFFFDDC><@A???4411333==BBBBDFFFFFFFFFA@@FFFFFFFFFFFFFFFFFFFFFF;;99933998<<899666:7=;;../..4/..:<==<<<<;;../4....//..6=49!<<??AAB::555<<?>>>>40111::<<<4455A5143446332155-.////3::;9322<7442631,,-,,-172....995<<<?>><44444444333:::::57---,,,,7,----175 AS:i:139 XS:i:0 XF:i:3 XE:i:2 XN:i:0
                      I am not clearly understanding this part.
                      I am using the default parameters for all.

                      Comment

                      Latest Articles

                      Collapse

                      • GATTACAT
                        Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by GATTACAT
                        Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                        07-01-2026, 11:43 AM
                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                        Here are nine questions we think about, in roughly the order they matter, before...
                        06-18-2026, 07:11 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 07-02-2026, 11:08 AM
                      0 responses
                      17 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-30-2026, 05:37 AM
                      0 responses
                      18 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-26-2026, 11:10 AM
                      0 responses
                      21 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      54 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...