Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CuffDiff w/ novel transcriptome

    Fairly new to RNA-Seq analysis here. We have assembled novel transcriptome using 454 and Illumina reads via Newbler2.5. Now we wish to use cufflinks package (in particular cuffdiff) to measure expression changes. It seems the cufflinks package is geared towards aligning reads to genomic reference, where cufflinks will output the transcript.gtf file after aligning reads to a genome via tophat. If our dataset is purely mRNA, do I still need to run cufflinks to get the transcript.gtf file, which is required as input to Cuffdiff?

  • #2
    Cuffdiff is only meant for RNA-Seq analysis where a reference genome is available. You can probably make it work for you by altering your alignment file to give your transcripts "dummy" genomic coordinates. Alternatively, I recommend you look at RSEM (http://deweylab.biostat.wisc.edu/rsem/) which is meant to estimate abundances purely from transcript sequences.

    Comment


    • #3
      I just re-read your question, and it's not clear whether or not a reference genome is available. If it is, you can map your reads to the reference using your assembled transcriptome (GTF) as an input to TopHat. You can then use Cuffdiff with your GTF without first running Cufflinks, if you do not wish to use its assembly function. Alternatively, you can use the Cufflinks assembler and Cuffcompare to combine the Cufflinks assembly with your own.

      Let me know if you have any further questions!

      Comment


      • #4
        We do not have a reference genome. So if I wanted to use Cuffdiff, would I have to manually create a "transcript.gtf" file for my assembled transcripts, just creating an entry in the GTF file for each assembled contig/isotig, and adding in fake or irrelevant genome coordinates?

        Comment


        • #5
          Yes, but it will probably be somewhat involved. You will need to build your GTF so that isoforms correctly overlap in the genome coordinates. You will then need to adjust your SAM alignments so that they are converted from transcriptome coordinates to your new "genomic" coordinates. If you decide to go this route, please let me know how it goes! I can also help if you get stuck along the way.

          Comment


          • #6
            Do you have any other ideas? Or should I look into RSEM or bioconductor packages?

            Comment


            • #7
              I would look into RSEM if I were you.

              Comment


              • #8
                Could I just use the "-G" option in cufflinks and use a reference annotation based on my assembled transcriptome? Also, can I directly use the transcript.gtf produced from cufflinks as cuffdiff input? I read that cuffdiff requires tss and p_id's, but it looks like cufflinks does not produce these IDs in the outputted gtf file.

                Comment


                • #9
                  p_id issue

                  I think Cole or Adam may like to add a few sentences in the mannual about getting p-id, I have used GTF file with CDS record but no p_id in the out put in most recent version of Cufflinks. It is quite a problem and not straight forward otherwise It is a great tool.

                  Comment


                  • #10
                    I have been testing an approach where I use Tophat to align reads back to contigs (treating the like many many small chromosomes in essence) and then using Cufflinks-compare-diff to look at differential expression.. This is in a novel transcriptome i.e. no reference. Seems to work OK enough-- although there may be some issue with the statistical detection of differneces that I have yet to explore.

                    Comment


                    • #11
                      Thats definitely of interest to us- I'd really like feedback on how well this works and any suggestions you may have on optimizing such a strategy.
                      Lior ([email protected])

                      Comment


                      • #12
                        I realize there are other programs such as RSEM that easily handle novel transcriptomes, however the CuffDiff package produces more detailed output. We used Newbler to assemble the transcriptome, and output for this program is in the form of isogroups, isotigs, and contigs, where an isogroup is considered a gene, isotig is transcript, and contigs are exons. Ideally this would be true for all the output, but is not the case. Many isotigs within a isogroup are simply assembly goofs, retrotransposon events, or indel genes from sister chromosomes.
                        An idea above suggested treating each gene as is if it was on its own separate chromosome. It should make no difference to the program if it thinks there are 10 or 1000000 chromosomes. This might make it easier to create a fake transcript.gtf file, where gene_id's would be the isogroup number, and transcript_id's are the isotig names. Then possibly treating the whole isotig as a CDS region?? With published results showing how Newbler2.5 is such a great assembler, there has to be someone who has assembled novel transcriptome via Newbler and mapped reads through TopHat/Cufflinks? Anyone??? lol

                        Comment


                        • #13
                          I apologize for giving you inaccurate information earlier, but it turns out that my idea for making a pseudo-genome will not work unless your transcriptome is aligned so that you know which transcripts overlap and where. Otherwise, you will need multi-read support, which Cufflinks currently lacks (although it is coming).

                          Short of knowing these overlaps, RSEM is your best option (to my knowledge).

                          Comment


                          • #14
                            Allel specific expression

                            Can we use Cufflink/ Cuffdiff to find the allel specific expression? Any suggestion please.

                            Comment


                            • #15
                              I forgot to mention that our transcriptome is plant. That means polyploid and littered with retrotransposons/transposons. The RSEM paper said 52% of reads in maize were multi-reads. I'm not sure if even RSEM can handle the amount of multi-read alignments a novel plant transcriptome will produce. We are still brainstorming ways to shortcut and streamline this pipeline. Will removing any transcripts & reads that map to retrotransposons affect the statistics of RSEM or CuffLinks? I have been favoring CuffLinks because of better documentation and output. The RSEM google group doesn't have any posts yet. Has anyone used RSEM for novel transcriptome?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM
                              • seqadmin
                                Multiomics Techniques Advancing Disease Research
                                by seqadmin


                                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                                A major leap in the field has
                                ...
                                02-08-2024, 06:33 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:12 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-23-2024, 04:11 PM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-21-2024, 08:52 AM
                              0 responses
                              73 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-20-2024, 08:57 AM
                              0 responses
                              62 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X