Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA & FASTQ or FASTA

    Hi, I'm very new to Bioinformatics and I have 2 FASTQ files to align. Im trying to align these sequences with BWA. But It doesnt let me do it.
    It says:
    [bwa_index] fail to open file '/datasets/SRR035022_1.filt.fastq'. Abort!
    Aborted

    Do I have to use FASTA format with BWA or I have something wrong in elsewhere ?

    Thanks in Advance...

  • #2
    Originally posted by kursuni View Post
    Hi, I'm very new to Bioinformatics and I have 2 FASTQ files to align. Im trying to align these sequences with BWA. But It doesnt let me do it.
    It says:
    [bwa_index] fail to open file '/datasets/SRR035022_1.filt.fastq'. Abort!
    Aborted

    Do I have to use FASTA format with BWA or I have something wrong in elsewhere ?

    Thanks in Advance...
    bwa 'index' is the clue. You bwa index the reference genome, then align your fastq files to it.

    Your index is a FASTA file.

    Your reads are fastq format.

    You posted the error, but not the command you ran, so it's hard to tell what your usage was to generate the error. Failing to open the file could also mean you're just not pointing bwa to the right location, but even so it seems like you might have skipped a step ahead in your workflow

    Comment


    • #3
      SeqAnswers Group

      Mr/Mrs.,
      I have installed and make BWA aligner software, but after getting into bwa-0.5.9 folder I get this

      COPYING bamlite.c bwa bwase.o bwt_gen bwtaln.o bwtio.c bwtsw2_aux.o bwtsw2_main.o kseq.h main.c solid2fastq.pl utils.o
      ChangeLog bamlite.h bwa.1 bwaseqio.c bwt_lite.c bwtgap.c bwtio.o bwtsw2_chain.c cs2nt.c ksort.h main.h stdaln.c
      Makefile bamlite.o bwape.c bwaseqio.o bwt_lite.h bwtgap.h bwtmisc.c bwtsw2_chain.o cs2nt.o kstring.c main.o stdaln.h
      NEWS bntseq.c bwape.o bwt.c bwt_lite.o bwtgap.o bwtmisc.o bwtsw2_core.c is.c kstring.h qualfa2fq.pl stdaln.o
      README bntseq.h bwase.c bwt.h bwtaln.c bwtindex.c bwtsw2.h bwtsw2_core.o is.o kstring.o simple_dp.c utils.c
      SRR038263_1.sai bntseq.o bwase.h bwt.o bwtaln.h bwtindex.o bwtsw2_aux.c bwtsw2_main.c khash.h kvec.h simple_dp.o utils.h

      How should I start alignment for reference sequence (hg18 fasta format) with SRR038263.fastq . I would be glad for your support.

      Regards,
      Momo

      Comment


      • #4
        Originally posted by Bukowski View Post
        bwa 'index' is the clue. You bwa index the reference genome, then align your fastq files to it.

        Your index is a FASTA file.

        Your reads are fastq format.

        You posted the error, but not the command you ran, so it's hard to tell what your usage was to generate the error. Failing to open the file could also mean you're just not pointing bwa to the right location, but even so it seems like you might have skipped a step ahead in your workflow
        Thank you for your reply..

        Since I'm learning now, I just realized that I need to use FASTA as my reference index file and FASTQ as my read file..

        I guess I'm pointing bwa to the right location, and the problem is my knowledge lack on dna sequencing and how to use these programs on linux environment..

        However, Is there any source that I can download FASTA file from ?

        Comment


        • #5
          Hi,

          When I run BWA for aligning fasta chr.1 sequence with fastq SRR reads there is an error as below . Could you plz help me out.

          Regards,

          [bwa-0.5.9]$ ./bwa aln /home/DATA/chr1.fa /home/DATA/SRA/SRA012240/SRX017837/SRR038263_1.fastq > SRR038263_1.sai
          [bwa_aln] 17bp reads: max_diff = 2
          [bwa_aln] 38bp reads: max_diff = 3
          [bwa_aln] 64bp reads: max_diff = 4
          [bwa_aln] 93bp reads: max_diff = 5
          [bwa_aln] 124bp reads: max_diff = 6
          [bwa_aln] 157bp reads: max_diff = 7
          [bwa_aln] 190bp reads: max_diff = 8
          [bwa_aln] 225bp reads: max_diff = 9
          [bwt_restore_bwt] fail to open file '/home/DATA/chr1.fa.bwt'. Abort!

          Comment


          • #6
            Index your reference first!
            ./bwa index -a bwtsw /home/DATA/chr1.fa

            Comment


            • #7
              I indexed my reference first :

              $bwa index -p indexed_chr1 -a is -c ~/fasta/chr1.fa
              [bwa_index] Pack nucleotide FASTA... 4.97 sec
              [bwa_index] Convert nucleotide PAC to color PAC... 1.48 sec
              [bwa_index] Reverse the packed sequence... 1.57 sec
              [bwa_index] Construct BWT for the packed sequence...
              [bwa_index] 110.54 seconds elapse.
              [bwa_index] Construct BWT for the reverse packed sequence...
              [bwa_index] 110.08 seconds elapse.
              [bwa_index] Update BWT... 1.03 sec
              [bwa_index] Update reverse BWT... 1.06 sec
              [bwa_index] Construct SA from BWT and Occ... 49.80 sec
              [bwa_index] Construct SA from reverse BWT and Occ... 50.04 sec


              then I tried to align it using the commands below then it gives error while opening fastq file..

              $ bwa aln ~/fasta/chr1.fa ~/datasets/SRR035022_1.filt.fastq > aln_sa.sai
              [bwa_aln] 17bp reads: max_diff = 2
              [bwa_aln] 38bp reads: max_diff = 3
              [bwa_aln] 64bp reads: max_diff = 4
              [bwa_aln] 93bp reads: max_diff = 5
              [bwa_aln] 124bp reads: max_diff = 6
              [bwa_aln] 157bp reads: max_diff = 7
              [bwa_aln] 190bp reads: max_diff = 8
              [bwa_aln] 225bp reads: max_diff = 9
              [bwa_seq_open] fail to open file '/home/ukursuncu/datasets/SRR035022_1.filt.fastq'. Abort!
              Aborted

              What would I be doing wrong ?
              Last edited by kursuni; 09-26-2011, 01:56 PM.

              Comment


              • #8
                kursuni, just a general remark, I would always put the complete path to your reference file, read file and output file, just to make sure, like:

                ./bwa aln /path/to/folder/with/fasta/chr1.fa /path/to/folder/with/datasets/SRR035022_1.filt.fastq > /path/to/output/aln_sa.sai

                If you are not so familiar with the command line yet, you can also look up the file names in your graphical file browser and copy/paste them into your command. That way, usually, the whole path is copied.

                cheers,
                Sophia

                Comment


                • #9
                  Hello,

                  I would like to index for aligning FASTQ sequences to FASTA

                  [haojamrocky@melon bwa-0.5.9]$ ./bwa index -a bwtsw -c /home/haojamrocky/DATA/hg18chr/hg18.fasta
                  [bwa_index] Pack nucleotide FASTA... [bns_fasta2bntseq] zero length sequence. Abort!

                  Comment


                  • #10
                    Hello,

                    I would like to index for aligning FASTQ sequences to FASTA hg18 reference sequence. The FASTQ sample sequences are SOLID reads. Could you please assist me. I hereby attach the error message while running on BWA for indexing the reference hg18.fasta .

                    [haojamrocky@melon bwa-0.5.9]$ ./bwa index -a bwtsw -c /home/haojamrocky/DATA/hg18chr/hg18.fasta
                    [bwa_index] Pack nucleotide FASTA... [bns_fasta2bntseq] zero length sequence. Abort!

                    Regards,
                    HR

                    Comment


                    • #11
                      Hello,

                      Before indexing hg18.fa , I put all chr1 to chrY including chrM to hg18.fa using this code mention below. When I tried for single chr1 fasta for indexing it runs properly. Is this error due to this code.

                      To cat all sequence together into one single fasta record:
                      $ cat chr*.fa | sed -e "/^>/d" >> hg18.fa

                      Regards,
                      HR

                      Comment


                      • #12
                        Originally posted by haojam View Post
                        Hello,

                        To cat all sequence together into one single fasta record:
                        $ cat chr*.fa | sed -e "/^>/d" >> hg18.fa

                        Regards,
                        HR
                        Did you check how the hg18.fa looks like after this and what size it has?

                        Comment


                        • #13
                          Hello,

                          Does BWA support SRR.....fastq.bz2 file for aligning with the human genome reference sequence?

                          Regards,
                          HR

                          Comment


                          • #14
                            Hi kursuni,

                            you may have a look here:
                            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                            that might help you

                            Comment


                            • #15
                              Originally posted by ulz_peter View Post
                              Hi kursuni,

                              you may have a look here:
                              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                              that might help you
                              Dear ulz_peter, It was very helpful.. Thank you very much...
                              Since I'm very new to bioinformatics, I really need so much help and am trying to learn through books, papers and internet resources such as this forum. But it takes time to learn bioinformatics anyway.. However, since I'm working on my thesis about bioinformatics, I need to do this as fast as I can due to the time limitation.. Therefore, I really appreciate any help on this..
                              If you may suggest any other papers or document, I would appreciate it as well..

                              Thanks again.
                              Best Regards..
                              Ugur

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X