Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa color space error

    Hi all,
    I'm getting an error using bwa to align solid reads to hg18.

    Here are the steps I have completed, and the error

    Create fastq files from my raw solid data (no problems here)
    > /bwa-0.5.0/solid2fastq.pl ...

    Index my hg18 reference fasta into colour space (no problems)
    > /bwa-0.5.0/bwa index -a bwtsw -c human.fasta

    Align some small fastq files (500K reads ) (again no problems):
    > /bwa-0.5.0/bwa aln -c ./human.fasta read1 > read1.sai
    > /bwa-0.5.0/bwa aln -c ./human.fasta read2 > read2.sai

    Then sampe the sai files (problems here - I am showing all the output):
    > /bwa-0.5.0/bwa sampe human.fasta read1.sai read2.sai read1 read2 out.sam
    [bwa_sai2sam_pe_core] convert to sequence coordinate...
    [infer_isize] fail to infer insert size: weird pairing
    [bwa_sai2sam_pe_core] time elapses: 18.60 sec
    [bwa_sai2sam_pe_core] change of coordinates in 0 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_sai2sam_pe_core] time elapses: 0.93 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... Segmentation fault

    here is a head of the two starting fastq files, in case this will help with debugging:

    ==> read1 <==
    @HS12521_tra19938_4:705_604_1571/2
    ATTAAAATTCCAATACTCCACTACCTCGAATTATTCTGTACTAAATTAA
    +
    ,2+262=,*9,95)9123)<2,1/-12,,2//455&+++;#0..*#$&9
    @HS12521_tra19938_4:705_604_1602/2
    GAGGCCAATCAGAATCAGTTAGAGCCGCTTCAGTCCAAAGACAGGGAAA
    +
    8<88>9<?99<3>;<<=9;=?67568=7786869;89;8;<&8683&:7
    @HS12521_tra19938_4:705_604_1651/2
    GCGCAGCTGCTTCGTCTAAAACGAGTAAAAAAAAAGAATGAATAAGGGA

    ==> read2 <==
    @HS12521_tra19938_4:705_604_1571/1
    CTATATTGCGGTATAGTAGATAGAACAATTTAACCGAATAGAGGCTTGC
    +
    A984:;<28<?4&*%*,)222&$214&3&$1;1)%/)(1/79#7++
    @HS12521_tra19938_4:705_604_1602/1
    GTCGTCTTCCTGATCTACCTTGGGGAAACCAACTCGATCCGCCAACGAC
    +
    ?78>697/=:=95;;<6==8>:=?;><,>;;3359;77;851<7:98,<
    @HS12521_tra19938_4:705_604_1651/1
    AGTATAAGCTACTTATTTGAGTACAGTGAGCGGGGTAGGCAGTTAAGGA


    Does anyone have a suggestion to help me get this working?
    Last edited by rcorbett; 09-29-2009, 09:57 AM. Reason: typo

  • #2
    The problem is that the solid2fastq.pl script included in BWA doesn't handle missing quality value. It generates -" for phred score -1. Please see my post
    "bwa samse segmentation fault" for details.

    Comment


    • #3
      Hi. Thanks for the quick reply. I tried using the script posted at the bottom of that thread, (csfastaToFastq), but it appears to only work for single ended reads. Did you edit solid2fastq.pl? If so, could you post the changes here?

      Thanks!

      Comment


      • #4
        Hi again,

        So I added a line in /bwa-0.5.0/solid2fastq.pl in read1 sub to change the -1 qualities to 0s before converting...
        s/-1/0/g;

        Now, as before I re-run the alignments, but now I get a different error when using sampe... bwa-0.5.0/bwa sampe human.fasta read1.fastq.sai read2.fastq.sai read1.fastq read2.fastq > aln.sam
        [bwa_sai2sam_pe_core] convert to sequence coordinate...
        [infer_isize] (25, 50, 75) percentile: (1165, 1290, 1467)
        [infer_isize] low and high boundaries: 561 and 2071
        [infer_isize] inferred external isize from 43138 pairs: 1300.140 +/- 188.574
        [bwa_sai2sam_pe_core] time elapses: 23.01 sec
        [bwa_sai2sam_pe_core] change of coordinates in 3192 alignments.
        [bwa_sai2sam_pe_core] align unmapped mate...
        [bwa_paired_sw] 6630 reads aligned out of 88901 candidates.
        [bwa_sai2sam_pe_core] time elapses: 47.83 sec
        [bwa_sai2sam_pe_core] refine gapped alignments... Segmentation fault

        Looks like we're close but not quite there, can anyone spot my problem?
        Thanks in advance!

        Comment


        • #5
          Originally posted by rcorbett View Post
          Hi again,

          So I added a line in /bwa-0.5.0/solid2fastq.pl in read1 sub to change the -1 qualities to 0s before converting...
          s/-1/0/g;

          Now, as before I re-run the alignments, but now I get a different error when using sampe... bwa-0.5.0/bwa sampe human.fasta read1.fastq.sai read2.fastq.sai read1.fastq read2.fastq > aln.sam
          [bwa_sai2sam_pe_core] convert to sequence coordinate...
          [infer_isize] (25, 50, 75) percentile: (1165, 1290, 1467)
          [infer_isize] low and high boundaries: 561 and 2071
          [infer_isize] inferred external isize from 43138 pairs: 1300.140 +/- 188.574
          [bwa_sai2sam_pe_core] time elapses: 23.01 sec
          [bwa_sai2sam_pe_core] change of coordinates in 3192 alignments.
          [bwa_sai2sam_pe_core] align unmapped mate...
          [bwa_paired_sw] 6630 reads aligned out of 88901 candidates.
          [bwa_sai2sam_pe_core] time elapses: 47.83 sec
          [bwa_sai2sam_pe_core] refine gapped alignments... Segmentation fault

          Looks like we're close but not quite there, can anyone spot my problem?
          Thanks in advance!
          If you haven't figured out how to solve the problem, please PMing me your email address. I can forward you my version of the script.

          Comment


          • #6
            My problem happened at

            > /bwa-0.5.0/solid2fastq.pl ...

            I have the single.fastq file, but read 1 and read 2 are empty..... and the single.fastq actually has only the forward read.......then I checked the csfasta file, seems all right...

            Is it problem of the script? I didn't change the orginal version....

            Comment


            • #7
              Possible alternatives

              It is possible you could remove errors in colour-space reads by first

              1) running SAET from bioscope
              2) Running Sasson and Michael Solid read quality scripts. (PMID: 20207696)

              then running solid2fastq.pl ... and BWA on the passed mp files

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 11:49 AM
              0 responses
              15 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-24-2024, 08:47 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X