Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jrober04
    Junior Member
    • Apr 2009
    • 4

    Truncated TopHat Files

    So I have 4 32 mer Illumina seq files each around 9.5gb each in fastQ format. TopHat runs exceedingly slow for 2 of them and outputs files that are useable by cufflinks and samtools. The other two run quite quickly and possess all of the same files(no errors in the logs) and the juncs and coverage files are all in good order. The only difference I can find is that the size of accepted_hits.sam is ~3gb for the failed files and ~1 for the successful files.
    If i run cufflinks on these files I get the following error:

    cufflinks /home/james/Desktop/TophatAlignments2010/SRX002556-g1/accepted_hits.sam
    Counting hits in map
    Error: this SAM file doesn't appear to be correctly sorted!
    current hit is at Chr1:7405, last one was at Chr1:66939

    If I run sam tools I get this error:
    samtools sort /home/james/Desktop/TophatAlignments2010/SRX002556-g1/accepted_hits.sam /home/james/Desktop/accepted_hits.sam
    [bam_header_read] EOF marker is absent.
    [bam_sort_core] truncated file. Continue anyway.
    Segmentation fault

    Has anyone come across this problem?
  • Cole Trapnell
    Senior Member
    • Nov 2008
    • 213

    #2
    There are couple of calls to 'sort' at the end of the pipeline, which can take a while on some machines, and could be difference between success and failure here. What kind of machine are you running this on? How much memory does it have?

    Comment

    • jrober04
      Junior Member
      • Apr 2009
      • 4

      #3
      It is a Ubuntu Box running a Core 2 duo quad core 9400 with 6 GB memmory with a 30gb swap space. I am going to try running the datasets w/o novel discovery to see if that affects the runs. A curious aspect of the files is that the two successful runs have roughly the same sized juncs,coverag,accepted_hits files and the two failed ones showed the same trend. The failed files had larger accepted hits files but smaller juncs.bed files compared to the successful ones so that makes me a little suspicious that there are more reads aligning under the failed two treatments.

      Comment

      • Cole Trapnell
        Senior Member
        • Nov 2008
        • 213

        #4
        UNIX sort is probably doing on-disk sorting followed by merges on a machine like that. Do you have plenty of free disk?

        Comment

        • jrober04
          Junior Member
          • Apr 2009
          • 4

          #5
          Yes I have about 600gb free on the drive that tophat is running on.

          Comment

          • Cole Trapnell
            Senior Member
            • Nov 2008
            • 213

            #6
            Hmm. Very strange. Can you send me the logs, along with the first 10k or so lines from the failed accepted_hits.sam? You'll probably have to post on the web somewhere, rather than email. If that's not possible, would you please at least email me the logs? I've not seen this before.

            Comment

            • chrisbala
              Member
              • Jan 2010
              • 82

              #7
              fixed?

              has there been any resolution to this question? I've got the same problem...

              thanks!

              Comment

              • Cole Trapnell
                Senior Member
                • Nov 2008
                • 213

                #8
                Originally posted by chrisbala View Post
                has there been any resolution to this question? I've got the same problem...

                thanks!
                Not yet - I have the logs, and some ideas about what's going on, but I haven't resolved the problem. It's possible it's an issue with a misformatted FASTQ. We'll let you know.

                Comment

                • chrisbala
                  Member
                  • Jan 2010
                  • 82

                  #9
                  clarification

                  Hey Cole,

                  I should clarify, my problem is actually not with the tophat output.

                  I actually just have a .sam file, derived by other means, that I've converted to .bam with samtools. That conversion seemed to go smoothly, but when I tried to sort, I got the same error as above.

                  Maybe this info will somehow be helpful in sorting out what the issue with the tophat output described above is... or maybe it will not... but if anyone has any thoughts about what might cause that error from samtools that would be a big help.

                  chris

                  Comment

                  • guo
                    Junior Member
                    • Jun 2011
                    • 8

                    #10
                    I have the same thing with Christ...

                    Then as for the origin, I do have a fastq conversion procedure before, when it's using bwa1, the solid2fastq.pl. I changed the script several times, but the latest change is the QV from -1 to 0. Could this be the cause...

                    Then come back to my problem now, it's like this


                    chengguo@statgenpro:~/CRS/samtools-0.1.16$ samtools sort /home/chengguo/CRS/bwa-0.5.0/12F.bam 12F.sorted.bam
                    [bam_header_read] EOF marker is absent. The input is probably truncated
                    chengguo@statgenpro:~/CRS/samtools-0.1.16$

                    Comment

                    Latest Articles

                    Collapse

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, Yesterday, 08:59 AM
                    0 responses
                    14 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    22 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    19 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 05-28-2026, 11:40 AM
                    0 responses
                    32 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...