Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • wisosonic
    Member
    • Mar 2011
    • 14

    BWA fail to open file

    hi all
    i am tyring to align a .fq file to a reference genome (.fa).
    so i start mqking indexe like :

    bwa indexe -a bwtsw exome/exome.fa -p exome

    and i get the 8 indexe files.

    then i am trying the "aln" command like :

    bwa aln exome s_1_1_sequence.fq > s_1_1_sequence.sai

    and i got an error saying that: [bwa_seq_open] fail to open file 's_1_1_sequence.fq' . Abort!

    i dont know what is wrong.. so i will be very thankfull if someone could give me the right command line to do that.
    thanks in advance
    wassim
  • Jon_Keats
    Senior Member
    • Mar 2010
    • 279

    #2
    Likely a relative path issue or file name indicated is not the true file name

    Are both the exome index files you created and the s_1_1_sequence.fq files in the same directory as the indicated command would require? If so it should work if s_1_1_sequence.fq is the true file name, its not really a .txt file?

    Comment

    • wisosonic
      Member
      • Mar 2011
      • 14

      #3
      yea they are in the same directory and the s_1_1_sequence.fq is a .txt file that i changed the extension cz i saw in the manual that "aln" supports .fq files ... but i dont really now why BWA cannot open the file !
      NOTE: i am working on ubuntu machine

      Comment

      • Jon_Keats
        Senior Member
        • Mar 2010
        • 279

        #4
        It should work then. I'm guessing it opens with "less s_1_1_sequence.fq". If so the only thing I can suggest is look at the index files are they "exome.ann, exome.amb, etc.." Maybe the error message is not correctly identifying the true issue. In my case I've never used the -p option during indexing so my files are all "exome.fa indexed to exome.fa.ann, exome.fa.amb, etc..." So that would be my next suggestion. The file extension really doesn't matter, so if it started as a .txt file there really is no need to change the extension unless you have a pipeline that requires that extension.

        Comment

        • swbarnes2
          Senior Member
          • May 2008
          • 910

          #5
          Eyeball your fq, and see if it looks like it should. Perhaps I'm mixing up bwa with some other application, but I had a similar error come up with a fq file that had a bunch of "--" in it from grep.

          Comment

          • wisosonic
            Member
            • Mar 2011
            • 14

            #6
            in fact i have noticed noticed that the problem is with the size of my reads file.
            the size is 7.1 GB and i tried to make a file with the first 50000 lines and it works fine.
            so i rearched for the max limit of lines number or file size but i didnt find anything
            could anyone know the max file size supported by BWA ?
            thanks
            wassim

            Comment

            • dp05yk
              Member
              • Dec 2010
              • 66

              #7
              Probably 4GB (2^32 bytes). The source code can be modified to handle larger files but you're probably best just splitting it in half.

              Comment

              • Jon_Keats
                Senior Member
                • Mar 2010
                • 279

                #8
                I've used files from HiSeq runs in the 7-9Gb range with no problem, the 4Gb limitation is for indexing there should not be a limitation to the size of the read files as bwa only process 200k reads at a time. Best guess if the first 50,000 works fine is that you have an issue in the fastq file. Do you still have the original (.txt) file from the sequencer? Maybe go back to that or do a line count (wc -l) and make sure the count is a multiple of 4 to start. If so try cutting out the first 50%, 75% and/or start with the everything but the last 10 reads/40 lines. Most often the issue will be a messed up new line or something at the end of the file.

                Comment

                • dp05yk
                  Member
                  • Dec 2010
                  • 66

                  #9
                  Good point Jon. Given that the error happens prior to any processing (I assume the author would have stated it runs fine for a while first), the error likely occurs within the first 262144 reads.

                  Comment

                  • wisosonic
                    Member
                    • Mar 2011
                    • 14

                    #10
                    hi all ..

                    thanks for suggestions .. but the file is working fine on a server that isnt more powerful than my machine ! in fact the error is comming when i just type the commad line for "aln" and it gives me this error before even running any part of the file .. (first 262144 reads)

                    i am wondering why it works fine on another machine and not on mine !!
                    even the other command "sampe" gives the same probleme ..

                    i tried to use another file .fq and still not works ...
                    its probably a probleme with permission to use phyisical memory cz i am working on a Pc in my lab and i think ther's a limition on using all RAM ( 4 GB )

                    Comment

                    • wencanh
                      Junior Member
                      • Mar 2012
                      • 5

                      #11
                      Hi, everyone! I encountered similar problem recently. The codes are presented here.

                      echo "bwa aln -t 15 /data/hg19/human_g1k_v37.fasta.gz /data/lane1.R1.clean.fq.gz > lane1.aln1.sai" | qsub -l nodes=node8pn=15

                      And I got the error as following:

                      [bwa_aln] 17bp reads: max_diff = 2
                      [bwa_aln] 38bp reads: max_diff = 3
                      [bwa_aln] 64bp reads: max_diff = 4
                      [bwa_aln] 93bp reads: max_diff = 5
                      [bwa_aln] 124bp reads: max_diff = 6
                      [bwa_aln] 157bp reads: max_diff = 7
                      [bwa_aln] 190bp reads: max_diff = 8
                      [bwa_aln] 225bp reads: max_diff = 9
                      [bwa_seq_open] fail to open file '/data/lane1.R1.clean.fq.gz'. Abort!

                      /opt/gridview/pbs/dispatcher/mom_priv/jobs/671.node1.SC: line 1: 31281 Aborted

                      bwa aln -t 15 /data/hg19/human_g1k_v37.fasta.gz /data/lane1.R1.clean.fq.gz > lane1.aln1.sai

                      Thanks a million!

                      Comment

                      • wisosonic
                        Member
                        • Mar 2011
                        • 14

                        #12
                        Hi wencanh,
                        I found the cause for this problem (in my case) :

                        1) In fact if you take a look on system requirement for BWA algo : http://bio-bwa.sourceforge.net/bwa.shtml
                        They say that it requires a minimum of 3.5 GB of RAM

                        2) If you check the link from "sourceforge" : http://sourceforge.net/apps/mediawik...e=SAM_protocol
                        they say that you should have a "A computer with 64-bit CPU, 8GB or more memory and 100GB free disk space"

                        So I think the problem is with the 64-bit CPU ... I suggest you to check your system specs.

                        Wassim

                        Comment

                        • wencanh
                          Junior Member
                          • Mar 2012
                          • 5

                          #13
                          Thank you for your prompt response.

                          I don't think it is the same in my case. When one of my workmate A submited it to the servier, it worked. That means the system requiremetn for BWA is met. But when I submited it to the servier, I got the same error as I mentioned before.

                          BTW, the file "lane1.R1.clean.fq.gz" mentioned before was submitted by my workmate A. But we are in the same group and I can open and edit the file.

                          Comment

                          Latest Articles

                          Collapse

                          • SEQadmin2
                            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                            by SEQadmin2


                            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                            ...
                            06-02-2026, 10:05 AM
                          • SEQadmin2
                            Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                            by SEQadmin2


                            With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                            Introduction

                            Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                            05-22-2026, 06:42 AM
                          • SEQadmin2
                            Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                            by SEQadmin2

                            Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                            Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                            05-06-2026, 09:04 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, Yesterday, 08:59 AM
                          0 responses
                          12 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 12:03 PM
                          0 responses
                          21 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 11:40 AM
                          0 responses
                          17 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 05-28-2026, 11:40 AM
                          0 responses
                          31 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...