Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat running problem

    Hi all,

    I'm trying to run mRNA-seq for human by tophat (v1.0.12).
    I succeeded to get proper output file in the preliminary dataset (first 100K reads from each .fq file). But I failed to get proper output in the real dataset (each contains ~17M reads).

    I would appreciate any help you could give me with this.

    Thanks in advance.

    -Yasu

    ### preliminary_test ###

    $ tophat -r 10 -p 8 -o tophat_hg19_test hg19 s_1_1.head4000000.fq,s_6_1.head4000000.fq,s_7_1.head4000000.fq s_1_2.head4000000.fq,s_6_2.head4000000.fq,s_7_2.head4000000.fq

    [Mon Feb 8 13:31:03 2010] Beginning TopHat run (v1.0.12)
    -----------------------------------------------
    [Mon Feb 8 13:31:03 2010] Preparing output location tophat_hg19_test/
    [Mon Feb 8 13:31:03 2010] Checking for Bowtie index files
    [Mon Feb 8 13:31:03 2010] Checking for reference FASTA file
    [Mon Feb 8 13:31:03 2010] Checking for Bowtie
    Bowtie version: 0.11.3.0
    [Mon Feb 8 13:31:03 2010] Checking reads
    seed length: 43bp
    format: fastq
    quality scale: --phred33-quals
    [Mon Feb 8 13:31:51 2010] Mapping reads against hg19 with Bowtie
    [Mon Feb 8 13:34:24 2010] Joining segment hits
    [Mon Feb 8 13:34:59 2010] Mapping reads against hg19 with Bowtie
    [Mon Feb 8 13:37:30 2010] Joining segment hits
    [Mon Feb 8 13:38:04 2010] Searching for junctions via segment mapping
    [Mon Feb 8 13:44:59 2010] Retrieving sequences for splices
    [Mon Feb 8 13:46:42 2010] Indexing splices
    [Mon Feb 8 13:47:58 2010] Mapping reads against segment_juncs with Bowtie
    [Mon Feb 8 13:48:47 2010] Joining segment hits
    [Mon Feb 8 13:49:26 2010] Mapping reads against segment_juncs with Bowtie
    [Mon Feb 8 13:50:15 2010] Joining segment hits
    [Mon Feb 8 13:50:52 2010] Reporting output tracks
    -----------------------------------------------
    Run complete [00:33:58 elapsed]


    ### real_data ###

    tophat -r 10 -p 8 -o tophat_hg19 hg19 s_1_1.fq,s_6_1.fq,s_7_1.fq s_1_2.fq,s_6_2.fq,s_7_2.fq

    [Mon Feb 8 14:24:47 2010] Beginning TopHat run (v1.0.12)
    -----------------------------------------------
    [Mon Feb 8 14:24:47 2010] Preparing output location tophat_hg19/
    [Mon Feb 8 14:24:47 2010] Checking for Bowtie index files
    [Mon Feb 8 14:24:47 2010] Checking for reference FASTA file
    [Mon Feb 8 14:24:47 2010] Checking for Bowtie
    Bowtie version: 0.11.3.0
    [Mon Feb 8 14:24:47 2010] Checking reads
    seed length: 43bp
    format: fastq
    quality scale: --phred33-quals
    [Mon Feb 8 14:39:23 2010] Mapping reads against hg19 with Bowtie
    [Mon Feb 8 15:24:40 2010] Joining segment hits
    [Mon Feb 8 15:35:38 2010] Mapping reads against hg19 with Bowtie
    [Mon Feb 8 16:18:44 2010] Joining segment hits
    [Mon Feb 8 16:18:44 2010] Searching for junctions via segment mapping
    Warning: junction database is empty!
    [Mon Feb 8 18:01:42 2010] Joining segment hits
    [Mon Feb 8 18:11:38 2010] Joining segment hits
    [Mon Feb 8 18:11:38 2010] Reporting output tracks
    [FAILED]
    Error: Report generation failed with err = 1
    Traceback (most recent call last):
    File "/bin/tophat", line 1518, in ?
    sys.exit(main())
    File "/bin/tophat", line 1490, in main
    params.gff_annotation)
    File "/bin/tophat", line 936, in compile_reports
    exit(1)
    TypeError: 'str' object is not callable

  • #2
    I add the report.log file from real_data (failed one).

    ### Real_data (reports.log) ###

    tophat_reports v1.0.12
    ---------------------------------------
    Error: cannot open map file for reading

    #####################

    Comparing with the run.log files from preliminary_test (succeeded one) and from real_data (failed one), "/bin/segment_juncs" doesn't work well.

    Can somebody give me any help?

    Thanks,

    -Yasu

    Comment


    • #3
      The fact that TopHat thinks the seed length is 43bp is concerning. The default is 25, and it shouldn't be different unless you specified --segment-length, which you didn't. TopHat currently requires that FASTQ files have records where all of the nucleotides for each read appear on a single line. Same goes for the quality strings - all the quality characters need to be on one line. This is a limitation I haven't had time to fix yet. Can you verify that your FASTQ file is formatted this way?

      Comment


      • #4
        Thanks for your kind help!!

        My fastq file is something like this. I omitted the sequence+position id from the line after "+". Does this make the things bad?

        -Yasu

        ###########
        @HWI-EAS368:1:1:9:316#0/1
        CTGGATGATAACATTCCAGAAGATGACTCAGGTGTCCCCACCC
        +
        BB66AB9ACBB@BCBAAA><BBBAAB?BBB@@@BA?B@BB@AB
        @HWI-EAS368:1:1:9:424#0/1
        CTCCCTGCCAGATATCGAGGAGGTGAAAGACCAGAGCAGGAAC
        +
        BCBBB>?>@CBABBB;A877??.:<<B@;@@?=>?A>?6AAA?
        @HWI-EAS368:1:1:9:1060#0/1
        TGGATGGTTCAGGATAATCACCTGAGCAGTGAAGCCAGCTGCT
        +
        BBBBB=?@BBB?=@A9AA@CAA><<@:5>7?=A?A@?=A???@
        @HWI-EAS368:1:1:9:410#0/1
        CGGAGGCGGAGGCTTGGGTGCGTTCAAGATTCAGCTTCACCCG
        +
        AA9AAA=:A7'=7=?4+366=AA@:A>999B:=2,=>1014>7
        @HWI-EAS368:1:1:9:807#0/1
        CGAACATTTCTGGCCCCCAAGTGTCAGCCCATTCACGTAAAAA
        +
        BBBBBBC@@<:;6BC>:@2<B=BBB@7;BB=:C@799:BBB?%
        @HWI-EAS368:1:1:9:405#0/1
        TGTAAAGCCTGAAACAGCTGCCTGTGTGGGACTGAGATGCAGG
        +
        ?=>B=AA@AB@AAB?88>@@@BB>?B>A<=>?A81<-<@@B@@

        Comment


        • #5
          I added the '--segment-length 25', but the comment is still as follows;

          [Tue Feb 9 16:47:40 2010] Checking reads
          seed length: 43bp
          format: fastq
          quality scale: --phred33-quals

          Did I go some wrong way?

          -Yasu

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Best Practices for Single-Cell Sequencing Analysis
            by seqadmin



            While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
            Yesterday, 07:15 AM
          • seqadmin
            Latest Developments in Precision Medicine
            by seqadmin



            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

            Somatic Genomics
            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
            05-24-2024, 01:16 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 06:58 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 08:18 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 08:04 AM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-03-2024, 06:55 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Working...
          X