Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Truncated TopHat Files

    So I have 4 32 mer Illumina seq files each around 9.5gb each in fastQ format. TopHat runs exceedingly slow for 2 of them and outputs files that are useable by cufflinks and samtools. The other two run quite quickly and possess all of the same files(no errors in the logs) and the juncs and coverage files are all in good order. The only difference I can find is that the size of accepted_hits.sam is ~3gb for the failed files and ~1 for the successful files.
    If i run cufflinks on these files I get the following error:

    cufflinks /home/james/Desktop/TophatAlignments2010/SRX002556-g1/accepted_hits.sam
    Counting hits in map
    Error: this SAM file doesn't appear to be correctly sorted!
    current hit is at Chr1:7405, last one was at Chr1:66939

    If I run sam tools I get this error:
    samtools sort /home/james/Desktop/TophatAlignments2010/SRX002556-g1/accepted_hits.sam /home/james/Desktop/accepted_hits.sam
    [bam_header_read] EOF marker is absent.
    [bam_sort_core] truncated file. Continue anyway.
    Segmentation fault

    Has anyone come across this problem?

  • #2
    There are couple of calls to 'sort' at the end of the pipeline, which can take a while on some machines, and could be difference between success and failure here. What kind of machine are you running this on? How much memory does it have?

    Comment


    • #3
      It is a Ubuntu Box running a Core 2 duo quad core 9400 with 6 GB memmory with a 30gb swap space. I am going to try running the datasets w/o novel discovery to see if that affects the runs. A curious aspect of the files is that the two successful runs have roughly the same sized juncs,coverag,accepted_hits files and the two failed ones showed the same trend. The failed files had larger accepted hits files but smaller juncs.bed files compared to the successful ones so that makes me a little suspicious that there are more reads aligning under the failed two treatments.

      Comment


      • #4
        UNIX sort is probably doing on-disk sorting followed by merges on a machine like that. Do you have plenty of free disk?

        Comment


        • #5
          Yes I have about 600gb free on the drive that tophat is running on.

          Comment


          • #6
            Hmm. Very strange. Can you send me the logs, along with the first 10k or so lines from the failed accepted_hits.sam? You'll probably have to post on the web somewhere, rather than email. If that's not possible, would you please at least email me the logs? I've not seen this before.

            Comment


            • #7
              fixed?

              has there been any resolution to this question? I've got the same problem...

              thanks!

              Comment


              • #8
                Originally posted by chrisbala View Post
                has there been any resolution to this question? I've got the same problem...

                thanks!
                Not yet - I have the logs, and some ideas about what's going on, but I haven't resolved the problem. It's possible it's an issue with a misformatted FASTQ. We'll let you know.

                Comment


                • #9
                  clarification

                  Hey Cole,

                  I should clarify, my problem is actually not with the tophat output.

                  I actually just have a .sam file, derived by other means, that I've converted to .bam with samtools. That conversion seemed to go smoothly, but when I tried to sort, I got the same error as above.

                  Maybe this info will somehow be helpful in sorting out what the issue with the tophat output described above is... or maybe it will not... but if anyone has any thoughts about what might cause that error from samtools that would be a big help.

                  chris

                  Comment


                  • #10
                    I have the same thing with Christ...

                    Then as for the origin, I do have a fastq conversion procedure before, when it's using bwa1, the solid2fastq.pl. I changed the script several times, but the latest change is the QV from -1 to 0. Could this be the cause...

                    Then come back to my problem now, it's like this


                    chengguo@statgenpro:~/CRS/samtools-0.1.16$ samtools sort /home/chengguo/CRS/bwa-0.5.0/12F.bam 12F.sorted.bam
                    [bam_header_read] EOF marker is absent. The input is probably truncated
                    chengguo@statgenpro:~/CRS/samtools-0.1.16$

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Best Practices for Single-Cell Sequencing Analysis
                      by seqadmin



                      While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                      06-06-2024, 07:15 AM
                    • seqadmin
                      Latest Developments in Precision Medicine
                      by seqadmin



                      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                      Somatic Genomics
                      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                      05-24-2024, 01:16 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 06-07-2024, 06:58 AM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 06-06-2024, 08:18 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 06-06-2024, 08:04 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 06-03-2024, 06:55 AM
                    0 responses
                    14 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X