Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufflinks funny output, scripture comparison

    Hi all,

    I have 50bp paired illumina reads which I have aligned with tophat (default parameters).
    The alignments look reasonable in IGV, or UCSC browser.

    I have run scripture on the tophat output, and I get a list of isoforms that look reasonable, if not verbose.
    However, when I run cufflinks I get very spotty connectivity.

    I am trying to attach a screenshot, which shows at the top, the split alignments of tophat, then the predicted transcripts of scripture, and below that, before the reference gene annotations there is the cufflinks output.

    Has anyone else seen cufflinks output that is disconected like this? Any ideas on how to improve the results?
    I have run scripture, and cufflinks on the same file.

    (the screenshot attempt didn't work out)
    The image I tried to attach has been posted here instead:
    Attached Files
    Last edited by rcorbett; 06-23-2010, 08:12 AM. Reason: screenshot too small to see

  • #2
    Originally posted by rcorbett View Post
    Hi all,

    I have 50bp paired illumina reads which I have aligned with tophat (default parameters).
    The alignments look reasonable in IGV, or UCSC browser.

    I have run scripture on the tophat output, and I get a list of isoforms that look reasonable, if not verbose.
    However, when I run cufflinks I get very spotty connectivity.

    I am trying to attach a screenshot, which shows at the top, the split alignments of tophat, then the predicted transcripts of scripture, and below that, before the reference gene annotations there is the cufflinks output.

    Has anyone else seen cufflinks output that is disconected like this? Any ideas on how to improve the results?
    I have run scripture, and cufflinks on the same file.

    (the screenshot attempt didn't work out)
    The image I tried to attach has been posted here instead:
    http://www.bcgsc.ca/downloads/rnaSeq...f3f_22ffe0.gif
    What are the parameters you used for running cufflinks/cuffcompare? Could it be that you are filtering out a number of reads based on some paramenter, i.e did you provide a -G file. if yes, your gtf file could be missing exon junctions.

    Also I noted that are a lot of reads landing in intronic regions, is that to be expected?
    Finally, can you please tell me which file you used to get the cuff.23.1 track on UCSC? I would like to see if I get similar dis-connectivity in my data.

    Comment


    • #3
      Hi thinkRNA,

      To run cufflinks, I used entirely default parameters. I used the pre-compiled 0.8.2 beta version for 64bit linux. I didn't provide a gtf of reference exons because I wanted to test the "de-novo" transcript assembly. I think that cufflinks should do this well according to the paper....

      "High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks."

      The intronic reads are more or less to be expected. The number of intronic reads varies with the genes, libraries, and disease type that we study. The disconnectivity of the identified transcripts is prevalent throughout my data set, in genes with high and low intronic levels.

      To load the cufflinks output into UCSC you can just take your transcripts.gtf output file and load it directly as a custom track.

      I would be interested to hear how your data performs with this software.

      thanks!

      Comment


      • #4
        SO, I looked at this same gene on UCSC along with junctions from tophat and fortunately I get the entire transcript connected.

        I have 75bp reads sequenced to ~30 million depth.

        I ran tophat with this parameter
        tophat -a 10 --coverage-search -p 4 -g 10 -G refFlat_RefSeq.gff -o s2_tophat mm9 ../s2.fastq

        cufflinks without -G option

        Here is the UCSC image
        http://picasaweb.google.com/priyamsi...08504614142194

        I have not explored too many other genes systematically but around 5 of them I have seen so far are connected well.
        I don't understand why there are two cuff ids (CUFF.204951 and CUFF.204952) with such different FPKM and coverages!! only difference in the two CUFF ids is that one is 3 base longer?

        HTML Code:
        ~/tophat/S2$ grep "Insr" transcripts.tmap 
        Insr    ENSMUST00000091291      p       CUFF.203963     CUFF.203963.1   100     1.082117        0.000000        2.612462        1.685393        89      CUFF.203963.1
        Insr    ENSMUST00000091291      p       CUFF.203965     CUFF.203965.1   100     0.687917        0.000000        1.482256        1.071429        210     CUFF.203965.1
        Insr    ENSMUST00000139504      c       CUFF.203967     CUFF.203967.1   100     2.363397        0.692223        4.034571        3.680982        163     CUFF.203967.1
        Insr    ENSMUST00000139504      j       CUFF.204951     CUFF.204951.1   44      5.694980        1.526815        9.863144        8.869908        9073    CUFF.204952.2
        Insr    ENSMUST00000139504      j       CUFF.204952     CUFF.204952.2   100     13.057124       8.871707        17.242540       20.336418       9076    CUFF.204952.2
        How did you get your BAM file to view on UCSC? Did you just upload your BAM file to an https server? I don't have access to a server, so I doubt I can upload it.

        May be you should see if tophat is picking those junctions for this gene? Given your image though, I can already see a lot of your reads are crossing junctions. You should also look systematically to see how many genes exhibit this behavior, you may just be unlucky with this one.

        Comment


        • #5
          I'm pretty jealous of your nice results! I have played with cufflinks quite a bit and haven't seen a decent transcript such as that in all of my data.

          Is it possible I am not seeing such good results because I am using 50bp reads? I just don't know at this point. Certainly the tophat results show a consistent level of junction reads for cufflinks to be expected to put it together correctly (after all scripture does a fine job).

          To show the bam on UCSC you need to index it with samtools, and then as you suggest, upload to a publicly viewable site. THen you just point UCSC browser at your bam and it works! If you are using picassa, you can probably (though I'm not sure) host your bam file on google somewhere and point UCSC to that.

          Unfortunately I have been looking at many genes and they all show exactly the same behaviour.

          Can you tell me exactly what version of cufflinks you are using, and on what OS? For extra points I could share a small part of my sam file with you and would love to see if you get the same results on my data.

          Comment


          • #6
            Originally posted by rcorbett View Post
            I'm pretty jealous of your nice results! I have played with cufflinks quite a bit and haven't seen a decent transcript such as that in all of my data.

            Is it possible I am not seeing such good results because I am using 50bp reads? I just don't know at this point. Certainly the tophat results show a consistent level of junction reads for cufflinks to be expected to put it together correctly (after all scripture does a fine job).

            To show the bam on UCSC you need to index it with samtools, and then as you suggest, upload to a publicly viewable site. THen you just point UCSC browser at your bam and it works! If you are using picassa, you can probably (though I'm not sure) host your bam file on google somewhere and point UCSC to that.

            Unfortunately I have been looking at many genes and they all show exactly the same behaviour.

            Can you tell me exactly what version of cufflinks you are using, and on what OS? For extra points I could share a small part of my sam file with you and would love to see if you get the same results on my data.
            Trust me, I have had my share of bad luck with these programs. I am now stuck in making sense of the output and tens of files spit out. I used linux 64 bit version and ofcourse the latest version of all programs given this forum is filled with the bugs reported in the older version. this is bizarre that tophat is reporting those junctions but cufflinks is not connecting them. Email Cole Trapnell and just hope that he will reply.

            Comment


            • #7
              If anyone is interested, Cole is working on a new version (0.8.3), that will improve these results.

              Comment


              • #8
                Originally posted by rcorbett View Post
                If anyone is interested, Cole is working on a new version (0.8.3), that will improve these results.
                do you know when it will be out? Is it possible for him to let out temporary fixes to critical known bugs reported.

                Comment


                • #9
                  how do you make refFlat_RefSeq.gff for mm9

                  Can somebody tell me where you get the refFlat_RefSeq.gff for mm9? I have found gff3 files for each chromosome (reference assembly, MGSCv37, of mouse build 37.1, in GFF3 format). Do these correspond to mm9? If so you have to combine these gff3 for each chromosome into one file, adding column for chromosome (chr1, chr2 etc) to each gff3 before merging the gff3 files?
                  Thanks

                  Comment


                  • #10
                    Has anyone resolved this issue? I am still seeing disconnected transcripts in many genes using tophat v1.1.2 and cufflinks 0.9.2. Tophat is run without the -G option, so de novo transcripts are found and I DO see reads connecting junctions that are then not reflected in the cufflinks transcripts.gtf output file.

                    Comment


                    • #11
                      IS anyone know how can I plot RNAseq differential expression results using Tophat Cufflink and Cuffdiff, for visualization

                      Comment


                      • #12
                        @rgejman

                        This issue has been resolved in v0.9.3.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        9 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        50 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        67 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X