Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by seqfast View Post
    not sure if you've all got this to work, but my version of tophat-fusion-post (from tophat2) points to:

    blast/

    for the blast database, not blast_human/ as listed in the tutorial webpage.

    you can change the code, or just put the blast db's into top_dir/blast

    good luck,

    sf
    Hi Guys,

    Seqfast is right. The folder name should be blast and NOT blast_human as described in the manual. For your information, using the described top_dir structure with the proposed amendment I was able to replicate the example results.

    Cheers,

    Fernando

    Comment


    • #17
      tophat-fusion-post

      hi fjrosello,

      I've posted all over the place about this but maybe you can help me also try to replicate the tophat-fusion-post example they have. I've tried everything and still get empty results at the filtration stage (it doesn't even BLAST anything), and I'm completely baffled why.

      Here is some output if you wouldn't mind taking a look, or if easier just send me a copy of your directory structure and command history to see where I diverged. Thanks a lot.

      tm

      [tankmanb01@node7-3 copy_files]$ ls -lrt
      total 251520
      drwxrwxr-x. 3 tankmanb01 tankmanb01a 32768 Oct 2 13:29 tophat_BT474_mixF.fastq
      drwxrwxr-x. 3 tankmanb01 tankmanb01a 32768 Oct 2 13:29 tophat_KPL_final
      drwxrwxr-x. 3 tankmanb01 tankmanb01a 32768 Oct 2 13:29 tophat_MCF7_final2
      drwxrwxr-x. 3 tankmanb01 tankmanb01a 32768 Oct 2 13:30 tophat_SKBR_final2
      lrwxrwxrwx. 1 tankmanb01 tankmanb01a 65 Oct 2 13:54 blast -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/blast
      lrwxrwxrwx. 1 tankmanb01 tankmanb01a 71 Oct 2 13:54 refGene.txt -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/refGene.txt
      lrwxrwxrwx. 1 tankmanb01 tankmanb01a 71 Oct 2 13:54 ensGene.txt -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/ensGene.txt
      lrwxrwxrwx. 1 tankmanb01 tankmanb01a 63 Oct 2 13:54 mcl -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/mcl
      lrwxrwxrwx. 1 tankmanb01 tankmanb01a 65 Oct 2 13:54 index -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/index
      drwxrwxr-x. 7 tankmanb01 tankmanb01a 32768 Oct 2 13:56 tophatfusion_out
      -rw-rw-r--. 1 tankmanb01 tankmanb01a 255743631 Oct 2 14:01 combofusion
      -rw-rw-r--. 1 tankmanb01 tankmanb01a 1334123 Oct 2 14:04 fusions_example.out
      -rwxr-xr-x. 1 tankmanb01 tankmanb01a 79424 Oct 2 14:05 tophat-fusion-post_ALT
      lrwxrwxrwx. 1 tankmanb01 tankmanb01a 5 Oct 2 14:25 blast_human -> blast
      [tankmanb01@node7-3 copy_files]$ find tophat_* -name "run.log" -exec head -1 {} \;
      /packages/tophat/2.0.4/bin/tophat -p 64 -o tophat/tophat_BT474_mixF.fastq --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 50 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM index/hg19
      BT474_mix_1.fastq BT474_mix_2.fastq
      /packages/tophat/2.0.4/bin/tophat -p 64 -o tophat/tophat_KPL_final --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM index/hg19 SRR064287_1.fastq
      SRR064287_2.fastq
      /packages/tophat/2.0.4/bin/tophat -p 64 -o tophat/tophat_MCF7_final2 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM index/hg19 SRR064286_1.fastq
      SRR064286_2.fastq
      /packages/tophat/2.0.4/bin/tophat -p 45 -o tophat/tophat_SKBR_final2 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 50 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM index/hg19 SKBR3_mix_1.fastq
      SKBR3_mix_2.fastq
      [tankmanb01@node7-3 copy_files]$ find . -name "fusions.out" -print
      ./tophat_BT474_mixF.fastq/fusions.out
      ./tophat_KPL_final/fusions.out
      ./tophat_MCF7_final2/fusions.out
      ./tophat_SKBR_final2/fusions.out
      [tankmanb01@node7-3 copy_files]$ tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 index/hg19
      [Tue Oct 2 14:29:37 2012] Beginning TopHat-Fusion post-processing run (v2.0.4)
      -----------------------------------------------
      [Tue Oct 2 14:29:37 2012] Extracting 23-mer around fusions and mapping them using Bowtie
      [Tue Oct 2 14:29:43 2012] Filtering fusions
      Processing: tophat_BT474_mixF.fastq/fusions.out
      Processing: tophat_KPL_final/fusions.out
      Processing: tophat_MCF7_final2/fusions.out
      Processing: tophat_SKBR_final2/fusions.out
      0 fusions are output in ./tophatfusion_out/potential_fusion.txt
      [Tue Oct 2 14:30:00 2012] Blasting 50-mers around fusions
      [Tue Oct 2 14:30:00 2012] Generating read distributions around fusions
      [Tue Oct 2 14:30:00 2012] Reporting final fusion candidates in html format
      num of fusions: 0
      -----------------------------------------------
      [Tue Oct 2 14:30:00 2012] Run complete [00:00:22 elapsed]
      [tankmanb01@node7-3 copy_files]$

      Comment


      • #18
        Originally posted by tankman View Post
        hi fjrosello,

        I've posted all over the place about this but maybe you can help me also try to replicate the tophat-fusion-post example they have. I've tried everything and still get empty results at the filtration stage (it doesn't even BLAST anything), and I'm completely baffled why.

        Here is some output if you wouldn't mind taking a look, or if easier just send me a copy of your directory structure and command history to see where I diverged. Thanks a lot.

        tm

        [tankmanb01@node7-3 copy_files]$ ls -lrt
        total 251520
        drwxrwxr-x. 3 tankmanb01 tankmanb01a 32768 Oct 2 13:29 tophat_BT474_mixF.fastq
        drwxrwxr-x. 3 tankmanb01 tankmanb01a 32768 Oct 2 13:29 tophat_KPL_final
        drwxrwxr-x. 3 tankmanb01 tankmanb01a 32768 Oct 2 13:29 tophat_MCF7_final2
        drwxrwxr-x. 3 tankmanb01 tankmanb01a 32768 Oct 2 13:30 tophat_SKBR_final2
        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 65 Oct 2 13:54 blast -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/blast
        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 71 Oct 2 13:54 refGene.txt -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/refGene.txt
        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 71 Oct 2 13:54 ensGene.txt -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/ensGene.txt
        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 63 Oct 2 13:54 mcl -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/mcl
        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 65 Oct 2 13:54 index -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/index
        drwxrwxr-x. 7 tankmanb01 tankmanb01a 32768 Oct 2 13:56 tophatfusion_out
        -rw-rw-r--. 1 tankmanb01 tankmanb01a 255743631 Oct 2 14:01 combofusion
        -rw-rw-r--. 1 tankmanb01 tankmanb01a 1334123 Oct 2 14:04 fusions_example.out
        -rwxr-xr-x. 1 tankmanb01 tankmanb01a 79424 Oct 2 14:05 tophat-fusion-post_ALT
        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 5 Oct 2 14:25 blast_human -> blast
        [tankmanb01@node7-3 copy_files]$ find tophat_* -name "run.log" -exec head -1 {} \;
        /packages/tophat/2.0.4/bin/tophat -p 64 -o tophat/tophat_BT474_mixF.fastq --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 50 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM index/hg19
        BT474_mix_1.fastq BT474_mix_2.fastq
        /packages/tophat/2.0.4/bin/tophat -p 64 -o tophat/tophat_KPL_final --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM index/hg19 SRR064287_1.fastq
        SRR064287_2.fastq
        /packages/tophat/2.0.4/bin/tophat -p 64 -o tophat/tophat_MCF7_final2 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM index/hg19 SRR064286_1.fastq
        SRR064286_2.fastq
        /packages/tophat/2.0.4/bin/tophat -p 45 -o tophat/tophat_SKBR_final2 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 50 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM index/hg19 SKBR3_mix_1.fastq
        SKBR3_mix_2.fastq
        [tankmanb01@node7-3 copy_files]$ find . -name "fusions.out" -print
        ./tophat_BT474_mixF.fastq/fusions.out
        ./tophat_KPL_final/fusions.out
        ./tophat_MCF7_final2/fusions.out
        ./tophat_SKBR_final2/fusions.out
        [tankmanb01@node7-3 copy_files]$ tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 index/hg19
        [Tue Oct 2 14:29:37 2012] Beginning TopHat-Fusion post-processing run (v2.0.4)
        -----------------------------------------------
        [Tue Oct 2 14:29:37 2012] Extracting 23-mer around fusions and mapping them using Bowtie
        [Tue Oct 2 14:29:43 2012] Filtering fusions
        Processing: tophat_BT474_mixF.fastq/fusions.out
        Processing: tophat_KPL_final/fusions.out
        Processing: tophat_MCF7_final2/fusions.out
        Processing: tophat_SKBR_final2/fusions.out
        0 fusions are output in ./tophatfusion_out/potential_fusion.txt
        [Tue Oct 2 14:30:00 2012] Blasting 50-mers around fusions
        [Tue Oct 2 14:30:00 2012] Generating read distributions around fusions
        [Tue Oct 2 14:30:00 2012] Reporting final fusion candidates in html format
        num of fusions: 0
        -----------------------------------------------
        [Tue Oct 2 14:30:00 2012] Run complete [00:00:22 elapsed]
        [tankmanb01@node7-3 copy_files]$
        Hi Tankman,

        Find below my directory structure:

        Total 48K
        32K drwxr-xr-x 2 fernandr hpcmimr 16K Jun 22 12:33 blast
        0 -rw-r--r-- 1 fernandr hpcmimr 38M Jun 22 12:33 ensGene.txt
        0 -rw-r--r-- 1 fernandr hpcmimr 7.2M Jun 22 12:33 ensGtp.txt
        0 -rw-r--r-- 1 fernandr hpcmimr 398K Jun 22 12:33 mcl
        0 drwxr-xr-x 3 fernandr hpcmimr 161 Jun 22 15:18 old.tophat_MCF7
        0 -rw-r--r-- 1 fernandr hpcmimr 11M Jun 22 12:33 refGene_sorted.txt
        0 -rw-r--r-- 1 fernandr hpcmimr 12M Jun 22 12:33 refGene.txt
        0 -rw-r--r-- 1 fernandr hpcmimr 1.7G May 7 2011 SRR064286_1.fastq
        0 -rw-r--r-- 1 fernandr hpcmimr 1.7G Mar 15 2011 SRR064286_2.fastq
        4.0K -rw-r--r-- 1 fernandr hpcmimr 3.5K Jun 30 19:25 tophat2_fusion_post.e3330717
        0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 30 18:39 tophat2_fusion_post.o3330717
        0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 30 18:39 tophat2_fusion_post.pe3330717
        0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 30 18:39 tophat2_fusion_post.po3330717
        4.0K -rw-r--r-- 1 fernandr hpcmimr 2.0K Jun 22 15:18 tophat2_test.e3327346
        4.0K -rw-r--r-- 1 fernandr hpcmimr 854 Jun 22 12:06 tophat2_test.job
        0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 22 13:12 tophat2_test.o3327346
        0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 22 13:12 tophat2_test.pe3327346
        0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 22 13:12 tophat2_test.po3327346
        0 -rw-r--r-- 1 fernandr hpcmimr 0 Oct 3 14:34 tophat_fusion.out
        0 drwxr-xr-x 7 fernandr hpcmimr 223 Jun 30 19:25 tophatfusion_out
        4.0K -rw-r--r-- 1 fernandr hpcmimr 763 Jun 30 18:07 tophat_fusion_post.job

        As you can see, I have not used symlinks at all. I know it may sound counter intuitive but you should give it a go when all the necessary files are in the directory where the program will be run.

        Find below the commands used to generated the output - similar, if not identical - to yours:

        /nfs/home/fernandr/biotools/tophat-2.0.3.Linux_x86_64/tophat -o tophat_MCF7 -p 8 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM /nfs/home/fernandr/biotools/references/iGenome/Homo_sapiens/UCSC/hg19/Sequence/BowtieIndex/genome SRR064286_1.fastq SRR064286_2.fastq

        /nfs/home/fernandr/biotools/bin/tophat-fusion-post -p 24 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 /nfs/home/fernandr/biotools/references/iGenome/Homo_sapiens/UCSC/hg19/Sequence/BowtieIndex/genome

        I hope it helps.

        Cheers,

        Fernando

        Comment


        • #19
          Hi Fernando,

          Thanks a lot for your help. 2 questions:

          1) Did you run only align the (SRR064286) MCF7 reads using tophat-fusion and then use tophat-fusion-post on just that lone fusions.out file (as opposed to all of KPL, SKPR, BT474, and MCF7)?

          2) Did you also use tophat 2.0.3 to align your reads?

          I still can't get the damn thing to work, even after removing the sym links (which really shouldn't matter!) and trying tophat-fusion-post version 2.0.3

          ahhhhh!

          Would it be trouble to pass me the exact output of tophat-fusion-post?

          thanks,
          tm

          tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 index/hg19
          [Wed Oct 3 07:06:25 2012] Beginning TopHat-Fusion post-processing run (v2.0.3)
          -----------------------------------------------
          [Wed Oct 3 07:06:25 2012] Extracting 23-mer around fusions and mapping them using Bowtie
          samples updated
          [Wed Oct 3 07:06:48 2012] Filtering fusions
          Processing: tophat_MCF7_final2/fusions.out
          0 fusions are output in ./tophatfusion_out/potential_fusion.txt
          [Wed Oct 3 07:06:56 2012] Blasting 50-mers around fusions
          [Wed Oct 3 07:06:56 2012] Generating read distributions around fusions
          [Wed Oct 3 07:06:56 2012] Reporting final fusion candidates in html format
          num of fusions: 0
          -----------------------------------------------
          [Wed Oct 3 07:06:56 2012] Run complete [00:00:31 elapsed]



          QUOTE=fjrossello;85530]Hi Tankman,

          Find below my directory structure:

          Total 48K
          32K drwxr-xr-x 2 fernandr hpcmimr 16K Jun 22 12:33 blast
          0 -rw-r--r-- 1 fernandr hpcmimr 38M Jun 22 12:33 ensGene.txt
          0 -rw-r--r-- 1 fernandr hpcmimr 7.2M Jun 22 12:33 ensGtp.txt
          0 -rw-r--r-- 1 fernandr hpcmimr 398K Jun 22 12:33 mcl
          0 drwxr-xr-x 3 fernandr hpcmimr 161 Jun 22 15:18 old.tophat_MCF7
          0 -rw-r--r-- 1 fernandr hpcmimr 11M Jun 22 12:33 refGene_sorted.txt
          0 -rw-r--r-- 1 fernandr hpcmimr 12M Jun 22 12:33 refGene.txt
          0 -rw-r--r-- 1 fernandr hpcmimr 1.7G May 7 2011 SRR064286_1.fastq
          0 -rw-r--r-- 1 fernandr hpcmimr 1.7G Mar 15 2011 SRR064286_2.fastq
          4.0K -rw-r--r-- 1 fernandr hpcmimr 3.5K Jun 30 19:25 tophat2_fusion_post.e3330717
          0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 30 18:39 tophat2_fusion_post.o3330717
          0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 30 18:39 tophat2_fusion_post.pe3330717
          0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 30 18:39 tophat2_fusion_post.po3330717
          4.0K -rw-r--r-- 1 fernandr hpcmimr 2.0K Jun 22 15:18 tophat2_test.e3327346
          4.0K -rw-r--r-- 1 fernandr hpcmimr 854 Jun 22 12:06 tophat2_test.job
          0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 22 13:12 tophat2_test.o3327346
          0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 22 13:12 tophat2_test.pe3327346
          0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 22 13:12 tophat2_test.po3327346
          0 -rw-r--r-- 1 fernandr hpcmimr 0 Oct 3 14:34 tophat_fusion.out
          0 drwxr-xr-x 7 fernandr hpcmimr 223 Jun 30 19:25 tophatfusion_out
          4.0K -rw-r--r-- 1 fernandr hpcmimr 763 Jun 30 18:07 tophat_fusion_post.job

          As you can see, I have not used symlinks at all. I know it may sound counter intuitive but you should give it a go when all the necessary files are in the directory where the program will be run.

          Find below the commands used to generated the output - similar, if not identical - to yours:

          /nfs/home/fernandr/biotools/tophat-2.0.3.Linux_x86_64/tophat -o tophat_MCF7 -p 8 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM /nfs/home/fernandr/biotools/references/iGenome/Homo_sapiens/UCSC/hg19/Sequence/BowtieIndex/genome SRR064286_1.fastq SRR064286_2.fastq

          /nfs/home/fernandr/biotools/bin/tophat-fusion-post -p 24 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 /nfs/home/fernandr/biotools/references/iGenome/Homo_sapiens/UCSC/hg19/Sequence/BowtieIndex/genome

          I hope it helps.

          Cheers,

          Fernando[/QUOTE]

          Comment


          • #20
            Hi Tankman,

            Q1: Yes, I tested the output of that test-sample and I then processed my own.
            Q2: Yes, I used tophat 2.0.3 for those reads and I am currently using TopHat 2.0.4. Unfortunately, I will not be able to send you the tophat-fusion-post output for the MCF7 sample because I deleted the tophat-fusion_out folder for that sample.
            At the same time, I could try to repeat the process and let you know the outcome. Is this OK?

            Cheers,

            Fernando

            Originally posted by tankman View Post
            Hi Fernando,

            Thanks a lot for your help. 2 questions:

            1) Did you run only align the (SRR064286) MCF7 reads using tophat-fusion and then use tophat-fusion-post on just that lone fusions.out file (as opposed to all of KPL, SKPR, BT474, and MCF7)?

            2) Did you also use tophat 2.0.3 to align your reads?

            I still can't get the damn thing to work, even after removing the sym links (which really shouldn't matter!) and trying tophat-fusion-post version 2.0.3

            ahhhhh!

            Would it be trouble to pass me the exact output of tophat-fusion-post?

            thanks,
            tm

            tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 index/hg19
            [Wed Oct 3 07:06:25 2012] Beginning TopHat-Fusion post-processing run (v2.0.3)
            -----------------------------------------------
            [Wed Oct 3 07:06:25 2012] Extracting 23-mer around fusions and mapping them using Bowtie
            samples updated
            [Wed Oct 3 07:06:48 2012] Filtering fusions
            Processing: tophat_MCF7_final2/fusions.out
            0 fusions are output in ./tophatfusion_out/potential_fusion.txt
            [Wed Oct 3 07:06:56 2012] Blasting 50-mers around fusions
            [Wed Oct 3 07:06:56 2012] Generating read distributions around fusions
            [Wed Oct 3 07:06:56 2012] Reporting final fusion candidates in html format
            num of fusions: 0
            -----------------------------------------------
            [Wed Oct 3 07:06:56 2012] Run complete [00:00:31 elapsed]



            QUOTE=fjrossello;85530]Hi Tankman,

            Find below my directory structure:

            Total 48K
            32K drwxr-xr-x 2 fernandr hpcmimr 16K Jun 22 12:33 blast
            0 -rw-r--r-- 1 fernandr hpcmimr 38M Jun 22 12:33 ensGene.txt
            0 -rw-r--r-- 1 fernandr hpcmimr 7.2M Jun 22 12:33 ensGtp.txt
            0 -rw-r--r-- 1 fernandr hpcmimr 398K Jun 22 12:33 mcl
            0 drwxr-xr-x 3 fernandr hpcmimr 161 Jun 22 15:18 old.tophat_MCF7
            0 -rw-r--r-- 1 fernandr hpcmimr 11M Jun 22 12:33 refGene_sorted.txt
            0 -rw-r--r-- 1 fernandr hpcmimr 12M Jun 22 12:33 refGene.txt
            0 -rw-r--r-- 1 fernandr hpcmimr 1.7G May 7 2011 SRR064286_1.fastq
            0 -rw-r--r-- 1 fernandr hpcmimr 1.7G Mar 15 2011 SRR064286_2.fastq
            4.0K -rw-r--r-- 1 fernandr hpcmimr 3.5K Jun 30 19:25 tophat2_fusion_post.e3330717
            0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 30 18:39 tophat2_fusion_post.o3330717
            0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 30 18:39 tophat2_fusion_post.pe3330717
            0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 30 18:39 tophat2_fusion_post.po3330717
            4.0K -rw-r--r-- 1 fernandr hpcmimr 2.0K Jun 22 15:18 tophat2_test.e3327346
            4.0K -rw-r--r-- 1 fernandr hpcmimr 854 Jun 22 12:06 tophat2_test.job
            0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 22 13:12 tophat2_test.o3327346
            0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 22 13:12 tophat2_test.pe3327346
            0 -rw-r--r-- 1 fernandr hpcmimr 0 Jun 22 13:12 tophat2_test.po3327346
            0 -rw-r--r-- 1 fernandr hpcmimr 0 Oct 3 14:34 tophat_fusion.out
            0 drwxr-xr-x 7 fernandr hpcmimr 223 Jun 30 19:25 tophatfusion_out
            4.0K -rw-r--r-- 1 fernandr hpcmimr 763 Jun 30 18:07 tophat_fusion_post.job

            As you can see, I have not used symlinks at all. I know it may sound counter intuitive but you should give it a go when all the necessary files are in the directory where the program will be run.

            Find below the commands used to generated the output - similar, if not identical - to yours:

            /nfs/home/fernandr/biotools/tophat-2.0.3.Linux_x86_64/tophat -o tophat_MCF7 -p 8 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM /nfs/home/fernandr/biotools/references/iGenome/Homo_sapiens/UCSC/hg19/Sequence/BowtieIndex/genome SRR064286_1.fastq SRR064286_2.fastq

            /nfs/home/fernandr/biotools/bin/tophat-fusion-post -p 24 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 /nfs/home/fernandr/biotools/references/iGenome/Homo_sapiens/UCSC/hg19/Sequence/BowtieIndex/genome

            I hope it helps.

            Cheers,

            Fernando
            [/QUOTE]

            Comment


            • #21
              Fantastic, thank you.


              Originally posted by fjrossello View Post
              Hi Tankman,

              Q1: Yes, I tested the output of that test-sample and I then processed my own.
              Q2: Yes, I used tophat 2.0.3 for those reads and I am currently using TopHat 2.0.4. Unfortunately, I will not be able to send you the tophat-fusion-post output for the MCF7 sample because I deleted the tophat-fusion_out folder for that sample.
              At the same time, I could try to repeat the process and let you know the outcome. Is this OK?

              Cheers,

              Fernando
              [/QUOTE]

              Comment


              • #22
                Originally posted by tankman View Post
                Fantastic, thank you.

                [/QUOTE]

                Hi Tankman,

                Sorry for my late reply. As previously discussed, find below the output from tophat fusion. It worked and I obtained the same output as before.
                Have yo been able to solve the issue?

                Cheers,

                Fernando

                [Thu Oct 11 13:50:30 2012] Beginning TopHat-Fusion post-processing run (v2.0.4)
                -----------------------------------------------
                [Thu Oct 11 13:50:30 2012] Extracting 23-mer around fusions and mapping them using Bowtie
                samples updated
                [Thu Oct 11 13:52:16 2012] Filtering fusions
                Processing: tophat_MCF7/fusions.out
                14 fusions are output in ./tophatfusion_out/potential_fusion.txt
                [Thu Oct 11 13:52:32 2012] Blasting 50-mers around fusions
                1. RSBN1 exon7(114354330-114355069) AP4B1 intron5(114441423-114442523)
                2. LRP1B exon89(142237963-142238101) PLXDC1 exon12(37265499-37265643)
                3. ENSG00000233459 exon1(204499298-204500738) ZNF207 exon8(30692347-30692505)
                4. ENSG00000250859 exon1(126847154-126848533) HNRNPK exon3(86585650-86585733)
                5. FOXA1 exon1(38058755-38061915) ENSG00000254868 intron7(38184001-38194100)
                6. ENSG00000224738 exon1(57183957-57184951) VMP1 exon11(57915654-57915757)
                7. VMP1 exon12(57917127-57917951) RPS6KB1 exon4(57991994-57992063)
                8. USP32 exon26(58342771-58342834) PPM1D intron1(58678247-58700879)
                9. BCAS3 intron23(59161925-59445685) BCAS4 exon1(49411465-49411709)
                10. BCAS3 exon24(59445686-59445854) BCAS4 exon1(49411465-49411709)
                11. CARM1 exon2(11015625-11015751) SMARCA4 exon4(11096863-11097268)
                12. ARFGEF2 exon1(47538273-47538546) SULF2 exon19(46365445-46365685)
                13. SULF2 exon21(46414790-46415359) ENSG00000171940 exon6(52210293-52210377)
                14. SULF2 exon21(46414790-46415359) ENSG00000171940 exon5(52210644-52210800)
                [Thu Oct 11 14:15:47 2012] Generating read distributions around fusions
                MCF7 (1-14)
                chr1-chr1 114354329 114442495 rf
                chr2-chr17 142237963 37265642 rr
                chr2-chr17 204499953 30692348 rf
                chr5-chr9 126847434 86585718 rr
                chr14-chr14 38061534 38184710 rr
                chr17-chr17 57184951 57915655 ff
                chr17-chr17 57917128 57992063 rr
                chr17-chr17 58342772 58679978 rr
                chr17-chr20 59430948 49411709 rr
                chr17-chr20 59445687 49411709 rr
                chr19-chr19 11015626 11097268 rr
                chr20-chr20 47538546 46365685 fr
                chr20-chr20 46415148 52210294 rf
                chr20-chr20 46415148 52210645 rf
                [Thu Oct 11 14:36:21 2012] Reporting final fusion candidates in html format
                num of fusions: 11
                -----------------------------------------------
                [Thu Oct 11 14:36:31 2012] Run complete [00:46:00 elapsed]

                Comment


                • #23
                  Make sure you have a "tophat_something" directory

                  I was suffering the same problems; tophat-fusion-post ran without incident, but produced no results. A colleague said he had the same problem, but solved it by using python 2.7, but by itself that didn't work for me (though I did not undo that change, it may still be neccesary).

                  Looking through the code exposes this line, though:

                  if string.find(dir, "tophat_") != 0:
                  continue

                  ... that is, the code *requires* the data to be in a directory named "tophat_*". If it fails to find that directoy, it will not complain, but will have extracted zero results.

                  Rather than rename my project directories I just added a symlink in my non-"tophat_"-named directory:

                  ln -s ./ tophat_project

                  ... and now processing works fine. The documentation vaguely mentions this, but does not stress that it's required. Again, the python version may be relevant as well, but I'm happy enough to leave that sleeping dog alone now that the pipeline is apparently functional.

                  Comment


                  • #24
                    FusionCatcher can be used also for finding fusion genes in RNA-seq data:
                    Finder of Somatic Fusion Genes in RNA-seq data. Contribute to ndaniel/fusioncatcher development by creating an account on GitHub.

                    Comment


                    • #25
                      Repost: tophat-fusion outputs empty result

                      Answer is:
                      My experience.. There is only one reason for empty tophat fusion post empty and that is: you did not prefix tophat_(your sample name) to your output directory while commanding tophat (pre)fusion i.e. first command. That is why tophat can not read your output file after first tophat (pre)fusion run....if not sure then try changing your output file name and again run tophat fusion post (not pre fusion)... you will get empty result file with 0 fusion gene.
                      If there is other reason, tophat will give you error... if there is no error that mean it can not read your output file.

                      Please let me explain you in detail.

                      Follow these commands: (available online at broadinstitute) How to run Tophat-fusion? https://confluence.broadinstitute.or...ageId=46531375

                      Step 1: Install
                      bowtie i.e. bowtie1
                      tophat2
                      samtools
                      ncbi.blast

                      Step 2: download
                      ensGene.txt
                      refGene.txt

                      Step 3:
                      make a directory of your sample, for example your sample name is John then your directory is John. Place your .fastq files (n=2, for pair end).

                      Step 4: transfer the downloaded files ensGene.txt and refGene.txt into John directory which is your sample directory.

                      Step 4: Login to putty
                      PATH=$PATHplease give the path of bowtie)
                      PATH=$PATHplease give the path of samtols)
                      PATH=$PATHplease give the path of tophat2)
                      PATH=$PATHplease give the path of blast)
                      export PATH

                      Step 5: change directory and come to your sample directory i.e. John
                      cd (please give the path of John)
                      Now you are here in you working directory like below:
                      John$

                      Step 6:run the tophat fusion using following standard commands. Most important thing is -o command which makes the your output directory, So always make your output folder name starting from tophat_(your output name). For example; sample name John so I will give command like this: -o tophat_John. Thats all.

                      Standard commands are: for sample name John (remember now I have sample name John which has two .fastq files i.e. John_1.fastq John_2.fastq).

                      tophat -o tophat_John -p 8 --fusion-search --keep-fasta-order --bowtie 1 --no-coverage-search -r 0 --mate-std-dev 80 --max-intron-length 100000 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromoso mes chrM (Path of hg19 which should be in bowtie1) John_1.fastq John_2.fastq
                      run it....

                      Step 7:Post Fusion
                      tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 (Path of hg19 which should be in bowtie1)
                      run it....

                      You will get the fusion genes.
                      If still there is proble... reply to me please....

                      Comment


                      • #26
                        Hi kind members of the thread...

                        After reading and trying countless time using the MCF7 trial data, my attempt to run tophat-fusion-post gives 0 fusion.

                        I think my problem is with the "tophat-fusion-post" rather than the pre-fusion??

                        Here is what was done

                        Step 1 - Installed required software (not sure about blast -see end of post)

                        Step 2- Downloaded required materials (shown in my directory -see below)

                        Step 3 - Run tophat
                        Code:
                        tophat -o tophat_MCF7_1 -p 15 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --max-intron-length 100000 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome SRR064286_1.fastq SRR064286_2.fastq
                        Step 4 - Run tophat-fusion-post

                        Code:
                        tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome
                        Code:
                        #[B]Output[/B]
                        [B][Tue Apr  2 11:17:59 2013] Beginning TopHat-Fusion post-processing run (v2.0.8)
                        -----------------------------------------------
                        [Tue Apr  2 11:17:59 2013] Extracting 23-mer around fusions and mapping them using Bowtie
                        [Tue Apr  2 11:18:00 2013] Filtering fusions
                                Processing: tophat_MCF7_1/fusions.out
                                0 fusions are output in ./tophatfusion_out/potential_fusion.txt
                        [Tue Apr  2 11:18:03 2013] Blasting 50-mers around fusions
                        [Tue Apr  2 11:18:03 2013] Generating read distributions around fusions
                        [Tue Apr  2 11:18:03 2013] Reporting final fusion candidates in html format
                                num of fusions: 0[/B]
                        My directory strcuture
                        drwxrwxr-x 2 zaki zaki 36K Mar 12 10:08 blast
                        -rw-rw-r-- 1 zaki zaki 17M Aug 10 2009 ensGene.txt
                        -rw-rw-r-- 1 zaki zaki 694 Apr 2 09:56 fusion.pbs
                        -rw-rw-r-- 1 zaki zaki 1.9K Apr 2 08:42 fusion.pl
                        -rw-rw-r-- 1 zaki zaki 720 Apr 2 10:04 fusion_post.pl
                        drwxrwxr-x 3 zaki zaki 4.0K Jan 29 16:49 Homo_sapiens
                        -rw-rw-r-- 1 zaki zaki 415K May 7 2012 mcl
                        drwxrwxr-x 4 zaki zaki 4.0K Apr 2 11:17 misc
                        -rw-rw-r-- 1 zaki zaki 12M Mar 4 04:47 refGene.txt
                        -rw-rw-r-- 1 zaki zaki 1.7G May 7 2011 SRR064286_1.fastq
                        -rw-rw-r-- 1 zaki zaki 1.7G Mar 15 2011 SRR064286_2.fastq
                        drwxrwxr-x 7 zaki zaki 4.0K Apr 2 09:56 tophatfusion_out
                        drwxrwxr-x 6 zaki zaki 4.0K Apr 2 11:16 tophat_MCF7_1
                        I have even tried running "tophat-fusion-post" with the fusion.out file from the tophat website

                        But I am still getting 0 fusion using the MCF7 data...

                        I made sure everything is installed... however the blastall, blastn is using a pre-installed version by the system admin (from 2011).. could this be the problem??

                        Code:
                        $which blastall - /opt/bio/ncbi/bin/blastall
                        $which blasn - /usr/bin/blastn
                        blastall version - 2.2.25
                        blastn version - BLAST 2.2.27+

                        What I find very worrying is - even if i used the fusion.out from tophat website, I still cant get any result for tophat-fusion-post

                        Some additional information
                        First two lines from Fusion.out [Fusion.out downloaded From tophat website]
                        chr9-chrX 11194 32777963 ff 1 0 0 0 29 21 8.000000 @
                        14 26 38 52 63 @
                        AAGTCGCACGGCGCCGGGCTGGGGGCGGGGGGGGGGGGGGGGGGGGGGGC GCCGTGCACGCGCAGAAACTCACGTCACGGCGGCGCGGCGCAGAGACGGG @
                        ATACTGACTACTGGATCTTATGTTTAAGGCAGATGCAGGTTTTTCTTGGG GGGGGGGGGGGGGCGGGGGAAGACCCAGGCAGGCCCATGTGATACATTTC @
                        1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 @
                        1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 @

                        chr1-chr1 14821 40591031 fr 1 0 0 0 28 22 6.000000 @
                        7 21 35 49 64 @
                        CGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCA GAGATGCCTGGAGGGAAAAGGCTGAGTGAGGGTGGTTGGTGGGAAACCCT @
                        CCATTAAAATCCTTTGCAAAAGTTGATTCCTCAGAACTGTGTCTGACCCC GAGATGCCATTGTGCCTCATGAGCTCCTAAAGTCTCCTAAGACCTTGCAA @
                        1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 @
                        1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 @
                        First two lines from fusion.out [Fusion out from my result running Step 3]
                        1-1 564566 33245833 rr 1 0 0 195 31 19 12.000000 @
                        10 22 34 46 59 @
                        GTTCAGGGGAGAGTGCGTTATATGTTGTTCCTAGGAAGATTGTAGTGGTG AGGGTGTTTATTATAATAATGTTTGTGTATTCGGCTATGAAGAATAGGGC @
                        GGCAGCACCCAGATGCAGACAGCCTGTATGTAGAGAAGATTGACGTGGGG GAAGCTGAACCACGGACTGTGGTGAGCGGCCTGGTACAGTTCGTGCCCAA @
                        1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 @
                        1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 @


                        1-1 564656 205685445 rr 1 0 0 0 24 26 2.000000 @
                        11 22 35 50 61 @
                        GGTCGTAGCGGAATCGGGGGTATGCTGTTCGAATTCATAAGAACAGGGAG GTCAGAAGTAGGGTCTTGGTGACAAAATATGTTGTGTAGAGTTCAGGGGA @
                        TAGTAAAAGACCTATCAGTGTTTCCACCATGCACTTCTATTTTTTAGGAG TTTATAATTTTAAGTCTTACATTCCTAGTAACATTTGGGCTTTTCTTAGG @
                        1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 @
                        1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 @
                        As you can see my fusion.out and the website's fusion.out is different even if both of us are using the MCF7 data....Now I am not sure what I am doing wrong... is it incorrect blast installation?? or my step 3 was run incorrectly??

                        Many thanks
                        Zaki

                        Comment


                        • #27
                          Hi zaki

                          There seems to be two problems and resolving them.

                          1. Replace all the spaces from the directory name which you copy and paste to run tophat

                          2. Do not re-name any sub-directory withing your working diirectory once you start running tophat2-pre-fusion and post fusion. After the completion of the prefusion, tophat make a subdirectory within you pwd named "tophat_MCF7". So, if you change any name after tophat-pre-fusion, it will not recognize the input files.

                          3. It seem that there is no hg19 genome file within your command:
                          tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome
                          I mean what is 'genome' in there, is it hg19.fastq ?
                          If not then try keeping the same file name hg19.fastq

                          it will work... and if not then try using -p 8 in pre-fusion run. if it works then modify as per your choice...

                          when your mind is overloaded then can miss a space and will make problem. cool it down. take rest too and then try..

                          Comment


                          • #28
                            Thanks for the reply Charitra,

                            Originally posted by Charitra View Post
                            Hi zaki
                            1. Replace all the spaces from the directory name which you copy and paste to run tophat
                            I dont quite understand what you mean by this? replace all the spaces? Would you mind elaborating?

                            2. Do not re-name any sub-directory withing your working diirectory
                            I haven't rename anything

                            3. It seem that there is no hg19 genome file within your command
                            Hmm..now that you mentioned it, I am using an Ensembl genome from tophat website. I will download the hg19 UCSC genome from tophatfusion website and give it another go. Did you download and use the indexed bowtie1 or bowtie2 UCSC genome reference?

                            Thanks again!

                            Comment


                            • #29
                              Yes Zaki
                              I used hg19 and index files. This will work, I think.

                              removing spaces mean when you make directory or subdirectory dont keep space in name such as Zaki MCF, use alwasy Zaki_MCF.. I think you do not use spaces.

                              Lets see, if this time it works. if it doesnt then reply here..

                              Cheers

                              Comment


                              • #30
                                Hi Charitra,

                                Tophat-fusion still is not working for me

                                A question, what was your accepted_hits.bam file size after u ran using the MCF7 dataset?

                                Code:
                                tophat -o tophat_MCF7 -p 8 --fusion-search --keep-fasta-order --bowtie 1 --no-coverage-search -r 0 --mate-std-dev 80 --max-intron-length 100000 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromoso mes chrM (Path of hg19 which should be in bowtie1) SRR064286_1.fastq SRR064286_2.fastq
                                Mine was
                                -rw-rw-r-- 1 zaki zaki 1191944311 Apr 4 11:14 accepted_hits.bam
                                -rw-rw-r-- 1 zaki zaki 46798194 Apr 4 11:09 fusions.out


                                Is it the same? Mine was using TopHat v2.0.8

                                Cheers

                                Edit - Did u mange to run deFuse? (I think u did reading from your other post!)

                                In your defuse-0.6.1/tools directory. Do you have a file called calccov? Not calccov.cpp
                                Last edited by zaki; 04-08-2013, 07:52 PM.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                9 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                50 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                67 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X