Header Leaderboard Ad

Collapse

Bowtie can't read index files

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie can't read index files

    Dear all,

    I'm having a recurrent problem with Bowtie: it fails reading the indexes it had just built.

    Here are some details about my configuration: I'm using Bowtie 0.12.5 (but 0.12.3 gave the exact same error), on a Linux x86_64 computer.

    I get this type of error messages :

    Error reading _plen[] array: 4194272, 55604484

    Error reading ebwt array: returned 41750080, length was 168445184

    The index had been previously built by the same version of Bowtie. In fact these errors had occurred while running TopHat (which incidentally does not catch the errors thrown by Bowtie and finishes the run with "success", but does not give correct or complete results).

    The worse thing is that this error does not occur all the times: as a test, I've run Bowtie about 100 times on a toy dataset (with the exact same input reads and genome index), and Bowtie only crashed 6 times. But it does seem that it crashes more often when the input is larger.

    I don't understand what might be the problem. I'm starting to wonder if it might be because the filesystem structure is somehow corrupt on the computers I'm using. This is why I would like to know if anyone else has encountered this problem.

    Any comments or suggestions would be much appreciated. Thank you for your help !

    Best,

    Anamaria

  • #2
    Originally posted by anecsulea View Post
    I'm having a recurrent problem with Bowtie: it fails reading the indexes it had just built.

    Here are some details about my configuration: I'm using Bowtie 0.12.5 (but 0.12.3 gave the exact same error), on a Linux x86_64 computer.

    I get this type of error messages :

    Error reading _plen[] array: 4194272, 55604484

    Error reading ebwt array: returned 41750080, length was 168445184

    The index had been previously built by the same version of Bowtie. In fact these errors had occurred while running TopHat (which incidentally does not catch the errors thrown by Bowtie and finishes the run with "success", but does not give correct or complete results).

    The worse thing is that this error does not occur all the times: as a test, I've run Bowtie about 100 times on a toy dataset (with the exact same input reads and genome index), and Bowtie only crashed 6 times. But it does seem that it crashes more often when the input is larger.

    I don't understand what might be the problem. I'm starting to wonder if it might be because the filesystem structure is somehow corrupt on the computers I'm using. This is why I would like to know if anyone else has encountered this problem.
    Hi Anamaria,

    These types of errors occur when the files are genuinely either corrupt or incomplete (e.g. if the disk becomes exhausted during the index-building process). Can you send detailed output from one example where this happens, including a 'ls -l' on the index files after bowtie-build completes?

    Thanks,
    Ben

    Comment


    • #3
      Hi Ben,

      This is what I originally thought, but I can't see how the exact same index file can be corrupted for one run, and ok on the next one. I've run several hundreds of tests, using the same index file and the same reads file, and only a few of these bowtie jobs crash.

      Here is the ls -l of the index files:

      ##################################

      rw-r--r-- 1 anecsule henrik 58822647 Jun 1 13:52 chr3_ensembl57.1.ebwt
      -rw-r--r-- 1 anecsule henrik 23794420 Jun 1 13:52 chr3_ensembl57.2.ebwt
      -rw-r--r-- 1 anecsule henrik 180665 Jun 1 13:47 chr3_ensembl57.3.ebwt
      -rw-r--r-- 1 anecsule henrik 47588832 Jun 1 13:47 chr3_ensembl57.4.ebwt
      -rw-r--r-- 1 anecsule henrik 205509239 May 29 15:37 chr3_ensembl57.fa
      -rw-r--r-- 1 anecsule henrik 58822647 Jun 1 13:58 chr3_ensembl57.rev.1.ebwt
      -rw-r--r-- 1 anecsule henrik 23794420 Jun 1 13:58 chr3_ensembl57.rev.2.ebwt

      ##################################

      And the output :

      ##################################

      Error reading ebwt array: returned 3953400, length was 54387328
      Your index files may be corrupt; please try re-building or re-downloading.
      A complete index consists of 6 files: XYZ.1.ebwt, XYZ.2.ebwt, XYZ.3.ebwt,
      XYZ.4.ebwt, XYZ.rev.1.ebwt, and XYZ.rev.2.ebwt. The XYZ.1.ebwt and
      XYZ.rev.1.ebwt files should have the same size, as should the XYZ.2.ebwt and
      XYZ.rev.2.ebwt files.
      Command: /home/vital-it/anecsule/Tools/bowtie-0.12.3/bowtie -p 4 -q --phred33-quals -m 1 /scratch/frt/yearly/necsulea/Orthosplice/results/tests_bowtie/index_0.12.3/chr3_ensembl57 /scratch/frt/yearly/necsulea/Orthosplice/results/tests_bowtie/reads.txt /scratch/frt/yearly/necsulea/Orthosplice/results/tests_bowtie/test_1_0.12.3/results_1.txt

      ##################################

      I'm currently testing one potential solution: I've noticed that in ebwt.h you're using the "read" function in C if BOWTIE_MM is defined (i.e. on Linux) and the "fread" function if not (i.e. on Windows). I was wondering if I would get the same errors with "fread", so I've compiled bowtie as if for Windows, and I'm doing the same tests. I'll let you know if that works ok.

      Also, I wanted to ask you if you think it's normal that TopHat does not catch this error thrown by Bowtie. I've had several TopHat runs that finished with apparent "success", but which in fact only gave partial results because reading the Bowtie index for the junction sequences had failed. This seems quite dangerous, as most users will not check the log files for Bowtie errors if TopHat has finished succesfully.

      Thanks again for your help !

      Comment


      • #4
        Originally posted by anecsulea View Post
        Here is the ls -l of the index files:

        ##################################

        rw-r--r-- 1 anecsule henrik 58822647 Jun 1 13:52 chr3_ensembl57.1.ebwt
        -rw-r--r-- 1 anecsule henrik 23794420 Jun 1 13:52 chr3_ensembl57.2.ebwt
        -rw-r--r-- 1 anecsule henrik 180665 Jun 1 13:47 chr3_ensembl57.3.ebwt
        -rw-r--r-- 1 anecsule henrik 47588832 Jun 1 13:47 chr3_ensembl57.4.ebwt
        -rw-r--r-- 1 anecsule henrik 205509239 May 29 15:37 chr3_ensembl57.fa
        -rw-r--r-- 1 anecsule henrik 58822647 Jun 1 13:58 chr3_ensembl57.rev.1.ebwt
        -rw-r--r-- 1 anecsule henrik 23794420 Jun 1 13:58 chr3_ensembl57.rev.2.ebwt
        Yep, looks good. The write is probably not failing and the files are probably not corrupt or incomplete.

        Originally posted by anecsulea View Post
        I'm currently testing one potential solution: I've noticed that in ebwt.h you're using the "read" function in C if BOWTIE_MM is defined (i.e. on Linux) and the "fread" function if not (i.e. on Windows). I was wondering if I would get the same errors with "fread", so I've compiled bowtie as if for Windows, and I'm doing the same tests. I'll let you know if that works ok.
        I'd be interested to know if that works.

        Is there anything else of note about the partition/filesystem that the index files are stored on? Is it NFS? The problem seems to be that bowtie-build successfully writes the entire index, but when it then tries to read it back in *immediately*, it gets something incomplete. That *might* be Bowtie's fault, but more likely it's some combination of OS & FS.

        Originally posted by anecsulea View Post
        Also, I wanted to ask you if you think it's normal that TopHat does not catch this error thrown by Bowtie. I've had several TopHat runs that finished with apparent "success", but which in fact only gave partial results because reading the Bowtie index for the junction sequences had failed. This seems quite dangerous, as most users will not check the log files for Bowtie errors if TopHat has finished succesfully.
        If you have separate questions about Bowtie and TopHat, it's best to post them separately. Cole reads Seqanswers messages about TopHat and I read ones about Bowtie.

        If Bowtie later successfully opens and queries that same set of index files, then they're not actually corrupt; it just appeared that way immediately after they were written, due to OS wackiness. So the TopHat results could very well be fine.

        Ben

        Comment


        • #5

          Is there anything else of note about the partition/filesystem that the index files are stored on? Is it NFS? The problem seems to be that bowtie-build successfully writes the entire index, but when it then tries to read it back in *immediately*, it gets something incomplete. That *might* be Bowtie's fault, but more likely it's some combination of OS & FS.
          The system file is Lustre - I'm doing my computations on a cluster. However I should tell you that Bowtie does not only crash *immediately* after building the index - in my tests there were at least a few minutes between building the index and running Bowtie.


          If you have separate questions about Bowtie and TopHat, it's best to post them separately. Cole reads Seqanswers messages about TopHat and I read ones about Bowtie.
          Of course, I understand - however I have already posted two messages about TopHat (in the forums Bioinformatics and RNASeq), with no response yet (nor was there any response to the e-mails I've sent - sorry for insisting, I was getting a bit desperate). Plus, the questions aren't that separate, in my opinion - we're dealing with a Bowtie error that TopHat should catch but fails to do so.



          If Bowtie later successfully opens and queries that same set of index files, then they're not actually corrupt; it just appeared that way immediately after they were written, due to OS wackiness. So the TopHat results could very well be fine.
          No, they are definitely not fine. In fact I'm running TopHat on long reads (76bp) so TopHat splits them up into three segments, and then tries to map the three segments on the bowtie index of the junction sequences. It can happen that only one of the mapping attempts fails, and the other ones work, so TopHat can still confirm some junctions. Anyway, I will explain all this into more detail in my TopHat-specific posts.

          I'll keep in touch about the Bowtie problem - but if you have any other suggestions for things that I should test, please let me know, I'm running out of ideas. Thanks !

          Best,

          Anamaria

          Comment


          • #6
            Hi again,

            So, I've done again several series of tests in which I replace the occurrences of the "read" function with "fread", and this solution seems to work fine. I haven't had any "Error reading..." messages in hundreds of tests, and the results are as expected.

            Actually the simplest way to make this change without modifying too much the source code was to force BOWTIE_MM = 0 in the make file. I've also had to manually replace some occurrences of "lseek" in ebwt.h with MM_SEEK for correct compilation (I'm surprised that Windows users - if there are any - haven't complained about this).

            Best wishes,

            Anamaria

            Comment


            • #7
              Could not find Bowtie index files ( genome.*.ebwt)

              Hi, I tried to follow the sample data (fruit fly) as suggested in the paper Trapnell et al 2012. But, it came out with this error even though the particular file already in the same directory. TQ


              [2012-10-16 13:39:15] Beginning TopHat run (v2.0.4)
              -----------------------------------------------
              [2012-10-16 13:39:15] Checking for Bowtie
              Bowtie 2 not found, checking for older version..
              Bowtie version: 0.12.8.0
              [2012-10-16 13:39:15] Checking for Samtools
              Samtools version: 0.1.18.0
              [2012-10-16 13:39:15] Checking for Bowtie index files
              Error: Could not find Bowtie index files ( genome.*.ebwt)

              Comment


              • #8
                Why don't you update to the current version of Bowtie2 and see if the problem is resolved?

                Comment


                • #9
                  Originally posted by JackieBadger View Post
                  Why don't you update to the current version of Bowtie2 and see if the problem is resolved?
                  TQ for the suggestion. Already updated with new Bowtie2 and its working.

                  Comment


                  • #10
                    build my own reference

                    Hi everybody,

                    I have a problem trying to create my index with bowtie for OSX I want to use multiple fastq files but first I merge all of those files in one, when I run bowtie-build, I obtain this:

                    Writing header
                    Reserving space for joined string
                    Joining reference sequences
                    Reference file does not seem to be a FASTA file

                    Then when I list the outputs I only obtain the 4 .ebwt files lacking *ebwt which are needed to run tophat.

                    what is the solution of that???

                    Thanks

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                      by seqadmin


                      ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                      01-24-2023, 01:19 PM
                    • seqadmin
                      Introduction to Single-Cell Sequencing
                      by seqadmin
                      Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                      The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                      ...
                      01-09-2023, 03:10 PM
                    • seqadmin
                      AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
                      by seqadmin
                      Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

                      Read type and length
                      AVITI is a short-read benchtop sequencer that also offers an innovative...
                      12-29-2022, 10:44 AM

                    ad_right_rmr

                    Collapse
                    Working...
                    X