Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • blindtiger454
    Member
    • Oct 2010
    • 30

    CuffDiff w/ novel transcriptome

    Fairly new to RNA-Seq analysis here. We have assembled novel transcriptome using 454 and Illumina reads via Newbler2.5. Now we wish to use cufflinks package (in particular cuffdiff) to measure expression changes. It seems the cufflinks package is geared towards aligning reads to genomic reference, where cufflinks will output the transcript.gtf file after aligning reads to a genome via tophat. If our dataset is purely mRNA, do I still need to run cufflinks to get the transcript.gtf file, which is required as input to Cuffdiff?
  • adarob
    Member
    • Jul 2010
    • 71

    #2
    Cuffdiff is only meant for RNA-Seq analysis where a reference genome is available. You can probably make it work for you by altering your alignment file to give your transcripts "dummy" genomic coordinates. Alternatively, I recommend you look at RSEM (http://deweylab.biostat.wisc.edu/rsem/) which is meant to estimate abundances purely from transcript sequences.

    Comment

    • adarob
      Member
      • Jul 2010
      • 71

      #3
      I just re-read your question, and it's not clear whether or not a reference genome is available. If it is, you can map your reads to the reference using your assembled transcriptome (GTF) as an input to TopHat. You can then use Cuffdiff with your GTF without first running Cufflinks, if you do not wish to use its assembly function. Alternatively, you can use the Cufflinks assembler and Cuffcompare to combine the Cufflinks assembly with your own.

      Let me know if you have any further questions!

      Comment

      • blindtiger454
        Member
        • Oct 2010
        • 30

        #4
        We do not have a reference genome. So if I wanted to use Cuffdiff, would I have to manually create a "transcript.gtf" file for my assembled transcripts, just creating an entry in the GTF file for each assembled contig/isotig, and adding in fake or irrelevant genome coordinates?

        Comment

        • adarob
          Member
          • Jul 2010
          • 71

          #5
          Yes, but it will probably be somewhat involved. You will need to build your GTF so that isoforms correctly overlap in the genome coordinates. You will then need to adjust your SAM alignments so that they are converted from transcriptome coordinates to your new "genomic" coordinates. If you decide to go this route, please let me know how it goes! I can also help if you get stuck along the way.

          Comment

          • blindtiger454
            Member
            • Oct 2010
            • 30

            #6
            Do you have any other ideas? Or should I look into RSEM or bioconductor packages?

            Comment

            • adarob
              Member
              • Jul 2010
              • 71

              #7
              I would look into RSEM if I were you.

              Comment

              • blindtiger454
                Member
                • Oct 2010
                • 30

                #8
                Could I just use the "-G" option in cufflinks and use a reference annotation based on my assembled transcriptome? Also, can I directly use the transcript.gtf produced from cufflinks as cuffdiff input? I read that cuffdiff requires tss and p_id's, but it looks like cufflinks does not produce these IDs in the outputted gtf file.

                Comment

                • honey
                  Senior Member
                  • Feb 2010
                  • 151

                  #9
                  p_id issue

                  I think Cole or Adam may like to add a few sentences in the mannual about getting p-id, I have used GTF file with CDS record but no p_id in the out put in most recent version of Cufflinks. It is quite a problem and not straight forward otherwise It is a great tool.

                  Comment

                  • peromhc
                    Senior Member
                    • Sep 2009
                    • 108

                    #10
                    I have been testing an approach where I use Tophat to align reads back to contigs (treating the like many many small chromosomes in essence) and then using Cufflinks-compare-diff to look at differential expression.. This is in a novel transcriptome i.e. no reference. Seems to work OK enough-- although there may be some issue with the statistical detection of differneces that I have yet to explore.

                    Comment

                    • lpachter
                      Member
                      • Feb 2010
                      • 40

                      #11
                      Thats definitely of interest to us- I'd really like feedback on how well this works and any suggestions you may have on optimizing such a strategy.
                      Lior ([email protected])

                      Comment

                      • blindtiger454
                        Member
                        • Oct 2010
                        • 30

                        #12
                        I realize there are other programs such as RSEM that easily handle novel transcriptomes, however the CuffDiff package produces more detailed output. We used Newbler to assemble the transcriptome, and output for this program is in the form of isogroups, isotigs, and contigs, where an isogroup is considered a gene, isotig is transcript, and contigs are exons. Ideally this would be true for all the output, but is not the case. Many isotigs within a isogroup are simply assembly goofs, retrotransposon events, or indel genes from sister chromosomes.
                        An idea above suggested treating each gene as is if it was on its own separate chromosome. It should make no difference to the program if it thinks there are 10 or 1000000 chromosomes. This might make it easier to create a fake transcript.gtf file, where gene_id's would be the isogroup number, and transcript_id's are the isotig names. Then possibly treating the whole isotig as a CDS region?? With published results showing how Newbler2.5 is such a great assembler, there has to be someone who has assembled novel transcriptome via Newbler and mapped reads through TopHat/Cufflinks? Anyone??? lol

                        Comment

                        • adarob
                          Member
                          • Jul 2010
                          • 71

                          #13
                          I apologize for giving you inaccurate information earlier, but it turns out that my idea for making a pseudo-genome will not work unless your transcriptome is aligned so that you know which transcripts overlap and where. Otherwise, you will need multi-read support, which Cufflinks currently lacks (although it is coming).

                          Short of knowing these overlaps, RSEM is your best option (to my knowledge).

                          Comment

                          • honey
                            Senior Member
                            • Feb 2010
                            • 151

                            #14
                            Allel specific expression

                            Can we use Cufflink/ Cuffdiff to find the allel specific expression? Any suggestion please.

                            Comment

                            • blindtiger454
                              Member
                              • Oct 2010
                              • 30

                              #15
                              I forgot to mention that our transcriptome is plant. That means polyploid and littered with retrotransposons/transposons. The RSEM paper said 52% of reads in maize were multi-reads. I'm not sure if even RSEM can handle the amount of multi-read alignments a novel plant transcriptome will produce. We are still brainstorming ways to shortcut and streamline this pipeline. Will removing any transcripts & reads that map to retrotransposons affect the statistics of RSEM or CuffLinks? I have been favoring CuffLinks because of better documentation and output. The RSEM google group doesn't have any posts yet. Has anyone used RSEM for novel transcriptome?

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 06:09 AM
                              0 responses
                              11 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              33 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              38 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              43 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...