Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie can't read index files

    Dear all,

    I'm having a recurrent problem with Bowtie: it fails reading the indexes it had just built.

    Here are some details about my configuration: I'm using Bowtie 0.12.5 (but 0.12.3 gave the exact same error), on a Linux x86_64 computer.

    I get this type of error messages :

    Error reading _plen[] array: 4194272, 55604484

    Error reading ebwt array: returned 41750080, length was 168445184

    The index had been previously built by the same version of Bowtie. In fact these errors had occurred while running TopHat (which incidentally does not catch the errors thrown by Bowtie and finishes the run with "success", but does not give correct or complete results).

    The worse thing is that this error does not occur all the times: as a test, I've run Bowtie about 100 times on a toy dataset (with the exact same input reads and genome index), and Bowtie only crashed 6 times. But it does seem that it crashes more often when the input is larger.

    I don't understand what might be the problem. I'm starting to wonder if it might be because the filesystem structure is somehow corrupt on the computers I'm using. This is why I would like to know if anyone else has encountered this problem.

    Any comments or suggestions would be much appreciated. Thank you for your help !

    Best,

    Anamaria

  • #2
    Originally posted by anecsulea View Post
    I'm having a recurrent problem with Bowtie: it fails reading the indexes it had just built.

    Here are some details about my configuration: I'm using Bowtie 0.12.5 (but 0.12.3 gave the exact same error), on a Linux x86_64 computer.

    I get this type of error messages :

    Error reading _plen[] array: 4194272, 55604484

    Error reading ebwt array: returned 41750080, length was 168445184

    The index had been previously built by the same version of Bowtie. In fact these errors had occurred while running TopHat (which incidentally does not catch the errors thrown by Bowtie and finishes the run with "success", but does not give correct or complete results).

    The worse thing is that this error does not occur all the times: as a test, I've run Bowtie about 100 times on a toy dataset (with the exact same input reads and genome index), and Bowtie only crashed 6 times. But it does seem that it crashes more often when the input is larger.

    I don't understand what might be the problem. I'm starting to wonder if it might be because the filesystem structure is somehow corrupt on the computers I'm using. This is why I would like to know if anyone else has encountered this problem.
    Hi Anamaria,

    These types of errors occur when the files are genuinely either corrupt or incomplete (e.g. if the disk becomes exhausted during the index-building process). Can you send detailed output from one example where this happens, including a 'ls -l' on the index files after bowtie-build completes?

    Thanks,
    Ben

    Comment


    • #3
      Hi Ben,

      This is what I originally thought, but I can't see how the exact same index file can be corrupted for one run, and ok on the next one. I've run several hundreds of tests, using the same index file and the same reads file, and only a few of these bowtie jobs crash.

      Here is the ls -l of the index files:

      ##################################

      rw-r--r-- 1 anecsule henrik 58822647 Jun 1 13:52 chr3_ensembl57.1.ebwt
      -rw-r--r-- 1 anecsule henrik 23794420 Jun 1 13:52 chr3_ensembl57.2.ebwt
      -rw-r--r-- 1 anecsule henrik 180665 Jun 1 13:47 chr3_ensembl57.3.ebwt
      -rw-r--r-- 1 anecsule henrik 47588832 Jun 1 13:47 chr3_ensembl57.4.ebwt
      -rw-r--r-- 1 anecsule henrik 205509239 May 29 15:37 chr3_ensembl57.fa
      -rw-r--r-- 1 anecsule henrik 58822647 Jun 1 13:58 chr3_ensembl57.rev.1.ebwt
      -rw-r--r-- 1 anecsule henrik 23794420 Jun 1 13:58 chr3_ensembl57.rev.2.ebwt

      ##################################

      And the output :

      ##################################

      Error reading ebwt array: returned 3953400, length was 54387328
      Your index files may be corrupt; please try re-building or re-downloading.
      A complete index consists of 6 files: XYZ.1.ebwt, XYZ.2.ebwt, XYZ.3.ebwt,
      XYZ.4.ebwt, XYZ.rev.1.ebwt, and XYZ.rev.2.ebwt. The XYZ.1.ebwt and
      XYZ.rev.1.ebwt files should have the same size, as should the XYZ.2.ebwt and
      XYZ.rev.2.ebwt files.
      Command: /home/vital-it/anecsule/Tools/bowtie-0.12.3/bowtie -p 4 -q --phred33-quals -m 1 /scratch/frt/yearly/necsulea/Orthosplice/results/tests_bowtie/index_0.12.3/chr3_ensembl57 /scratch/frt/yearly/necsulea/Orthosplice/results/tests_bowtie/reads.txt /scratch/frt/yearly/necsulea/Orthosplice/results/tests_bowtie/test_1_0.12.3/results_1.txt

      ##################################

      I'm currently testing one potential solution: I've noticed that in ebwt.h you're using the "read" function in C if BOWTIE_MM is defined (i.e. on Linux) and the "fread" function if not (i.e. on Windows). I was wondering if I would get the same errors with "fread", so I've compiled bowtie as if for Windows, and I'm doing the same tests. I'll let you know if that works ok.

      Also, I wanted to ask you if you think it's normal that TopHat does not catch this error thrown by Bowtie. I've had several TopHat runs that finished with apparent "success", but which in fact only gave partial results because reading the Bowtie index for the junction sequences had failed. This seems quite dangerous, as most users will not check the log files for Bowtie errors if TopHat has finished succesfully.

      Thanks again for your help !

      Comment


      • #4
        Originally posted by anecsulea View Post
        Here is the ls -l of the index files:

        ##################################

        rw-r--r-- 1 anecsule henrik 58822647 Jun 1 13:52 chr3_ensembl57.1.ebwt
        -rw-r--r-- 1 anecsule henrik 23794420 Jun 1 13:52 chr3_ensembl57.2.ebwt
        -rw-r--r-- 1 anecsule henrik 180665 Jun 1 13:47 chr3_ensembl57.3.ebwt
        -rw-r--r-- 1 anecsule henrik 47588832 Jun 1 13:47 chr3_ensembl57.4.ebwt
        -rw-r--r-- 1 anecsule henrik 205509239 May 29 15:37 chr3_ensembl57.fa
        -rw-r--r-- 1 anecsule henrik 58822647 Jun 1 13:58 chr3_ensembl57.rev.1.ebwt
        -rw-r--r-- 1 anecsule henrik 23794420 Jun 1 13:58 chr3_ensembl57.rev.2.ebwt
        Yep, looks good. The write is probably not failing and the files are probably not corrupt or incomplete.

        Originally posted by anecsulea View Post
        I'm currently testing one potential solution: I've noticed that in ebwt.h you're using the "read" function in C if BOWTIE_MM is defined (i.e. on Linux) and the "fread" function if not (i.e. on Windows). I was wondering if I would get the same errors with "fread", so I've compiled bowtie as if for Windows, and I'm doing the same tests. I'll let you know if that works ok.
        I'd be interested to know if that works.

        Is there anything else of note about the partition/filesystem that the index files are stored on? Is it NFS? The problem seems to be that bowtie-build successfully writes the entire index, but when it then tries to read it back in *immediately*, it gets something incomplete. That *might* be Bowtie's fault, but more likely it's some combination of OS & FS.

        Originally posted by anecsulea View Post
        Also, I wanted to ask you if you think it's normal that TopHat does not catch this error thrown by Bowtie. I've had several TopHat runs that finished with apparent "success", but which in fact only gave partial results because reading the Bowtie index for the junction sequences had failed. This seems quite dangerous, as most users will not check the log files for Bowtie errors if TopHat has finished succesfully.
        If you have separate questions about Bowtie and TopHat, it's best to post them separately. Cole reads Seqanswers messages about TopHat and I read ones about Bowtie.

        If Bowtie later successfully opens and queries that same set of index files, then they're not actually corrupt; it just appeared that way immediately after they were written, due to OS wackiness. So the TopHat results could very well be fine.

        Ben

        Comment


        • #5

          Is there anything else of note about the partition/filesystem that the index files are stored on? Is it NFS? The problem seems to be that bowtie-build successfully writes the entire index, but when it then tries to read it back in *immediately*, it gets something incomplete. That *might* be Bowtie's fault, but more likely it's some combination of OS & FS.
          The system file is Lustre - I'm doing my computations on a cluster. However I should tell you that Bowtie does not only crash *immediately* after building the index - in my tests there were at least a few minutes between building the index and running Bowtie.


          If you have separate questions about Bowtie and TopHat, it's best to post them separately. Cole reads Seqanswers messages about TopHat and I read ones about Bowtie.
          Of course, I understand - however I have already posted two messages about TopHat (in the forums Bioinformatics and RNASeq), with no response yet (nor was there any response to the e-mails I've sent - sorry for insisting, I was getting a bit desperate). Plus, the questions aren't that separate, in my opinion - we're dealing with a Bowtie error that TopHat should catch but fails to do so.



          If Bowtie later successfully opens and queries that same set of index files, then they're not actually corrupt; it just appeared that way immediately after they were written, due to OS wackiness. So the TopHat results could very well be fine.
          No, they are definitely not fine. In fact I'm running TopHat on long reads (76bp) so TopHat splits them up into three segments, and then tries to map the three segments on the bowtie index of the junction sequences. It can happen that only one of the mapping attempts fails, and the other ones work, so TopHat can still confirm some junctions. Anyway, I will explain all this into more detail in my TopHat-specific posts.

          I'll keep in touch about the Bowtie problem - but if you have any other suggestions for things that I should test, please let me know, I'm running out of ideas. Thanks !

          Best,

          Anamaria

          Comment


          • #6
            Hi again,

            So, I've done again several series of tests in which I replace the occurrences of the "read" function with "fread", and this solution seems to work fine. I haven't had any "Error reading..." messages in hundreds of tests, and the results are as expected.

            Actually the simplest way to make this change without modifying too much the source code was to force BOWTIE_MM = 0 in the make file. I've also had to manually replace some occurrences of "lseek" in ebwt.h with MM_SEEK for correct compilation (I'm surprised that Windows users - if there are any - haven't complained about this).

            Best wishes,

            Anamaria

            Comment


            • #7
              Could not find Bowtie index files ( genome.*.ebwt)

              Hi, I tried to follow the sample data (fruit fly) as suggested in the paper Trapnell et al 2012. But, it came out with this error even though the particular file already in the same directory. TQ


              [2012-10-16 13:39:15] Beginning TopHat run (v2.0.4)
              -----------------------------------------------
              [2012-10-16 13:39:15] Checking for Bowtie
              Bowtie 2 not found, checking for older version..
              Bowtie version: 0.12.8.0
              [2012-10-16 13:39:15] Checking for Samtools
              Samtools version: 0.1.18.0
              [2012-10-16 13:39:15] Checking for Bowtie index files
              Error: Could not find Bowtie index files ( genome.*.ebwt)

              Comment


              • #8
                Why don't you update to the current version of Bowtie2 and see if the problem is resolved?

                Comment


                • #9
                  Originally posted by JackieBadger View Post
                  Why don't you update to the current version of Bowtie2 and see if the problem is resolved?
                  TQ for the suggestion. Already updated with new Bowtie2 and its working.

                  Comment


                  • #10
                    build my own reference

                    Hi everybody,

                    I have a problem trying to create my index with bowtie for OSX I want to use multiple fastq files but first I merge all of those files in one, when I run bowtie-build, I obtain this:

                    Writing header
                    Reserving space for joined string
                    Joining reference sequences
                    Reference file does not seem to be a FASTA file

                    Then when I list the outputs I only obtain the 4 .ebwt files lacking *ebwt which are needed to run tophat.

                    what is the solution of that???

                    Thanks

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Genetic Variation in Immunogenetics and Antibody Diversity
                      by seqadmin



                      The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                      11-06-2024, 07:24 PM
                    • seqadmin
                      Choosing Between NGS and qPCR
                      by seqadmin



                      Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                      10-18-2024, 07:11 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 11:09 AM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Today, 06:13 AM
                    0 responses
                    19 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 11-01-2024, 06:09 AM
                    0 responses
                    30 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-30-2024, 05:31 AM
                    0 responses
                    21 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X