Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CuffDiff w/ novel transcriptome

    Fairly new to RNA-Seq analysis here. We have assembled novel transcriptome using 454 and Illumina reads via Newbler2.5. Now we wish to use cufflinks package (in particular cuffdiff) to measure expression changes. It seems the cufflinks package is geared towards aligning reads to genomic reference, where cufflinks will output the transcript.gtf file after aligning reads to a genome via tophat. If our dataset is purely mRNA, do I still need to run cufflinks to get the transcript.gtf file, which is required as input to Cuffdiff?

  • #2
    Cuffdiff is only meant for RNA-Seq analysis where a reference genome is available. You can probably make it work for you by altering your alignment file to give your transcripts "dummy" genomic coordinates. Alternatively, I recommend you look at RSEM (http://deweylab.biostat.wisc.edu/rsem/) which is meant to estimate abundances purely from transcript sequences.

    Comment


    • #3
      I just re-read your question, and it's not clear whether or not a reference genome is available. If it is, you can map your reads to the reference using your assembled transcriptome (GTF) as an input to TopHat. You can then use Cuffdiff with your GTF without first running Cufflinks, if you do not wish to use its assembly function. Alternatively, you can use the Cufflinks assembler and Cuffcompare to combine the Cufflinks assembly with your own.

      Let me know if you have any further questions!

      Comment


      • #4
        We do not have a reference genome. So if I wanted to use Cuffdiff, would I have to manually create a "transcript.gtf" file for my assembled transcripts, just creating an entry in the GTF file for each assembled contig/isotig, and adding in fake or irrelevant genome coordinates?

        Comment


        • #5
          Yes, but it will probably be somewhat involved. You will need to build your GTF so that isoforms correctly overlap in the genome coordinates. You will then need to adjust your SAM alignments so that they are converted from transcriptome coordinates to your new "genomic" coordinates. If you decide to go this route, please let me know how it goes! I can also help if you get stuck along the way.

          Comment


          • #6
            Do you have any other ideas? Or should I look into RSEM or bioconductor packages?

            Comment


            • #7
              I would look into RSEM if I were you.

              Comment


              • #8
                Could I just use the "-G" option in cufflinks and use a reference annotation based on my assembled transcriptome? Also, can I directly use the transcript.gtf produced from cufflinks as cuffdiff input? I read that cuffdiff requires tss and p_id's, but it looks like cufflinks does not produce these IDs in the outputted gtf file.

                Comment


                • #9
                  p_id issue

                  I think Cole or Adam may like to add a few sentences in the mannual about getting p-id, I have used GTF file with CDS record but no p_id in the out put in most recent version of Cufflinks. It is quite a problem and not straight forward otherwise It is a great tool.

                  Comment


                  • #10
                    I have been testing an approach where I use Tophat to align reads back to contigs (treating the like many many small chromosomes in essence) and then using Cufflinks-compare-diff to look at differential expression.. This is in a novel transcriptome i.e. no reference. Seems to work OK enough-- although there may be some issue with the statistical detection of differneces that I have yet to explore.

                    Comment


                    • #11
                      Thats definitely of interest to us- I'd really like feedback on how well this works and any suggestions you may have on optimizing such a strategy.
                      Lior ([email protected])

                      Comment


                      • #12
                        I realize there are other programs such as RSEM that easily handle novel transcriptomes, however the CuffDiff package produces more detailed output. We used Newbler to assemble the transcriptome, and output for this program is in the form of isogroups, isotigs, and contigs, where an isogroup is considered a gene, isotig is transcript, and contigs are exons. Ideally this would be true for all the output, but is not the case. Many isotigs within a isogroup are simply assembly goofs, retrotransposon events, or indel genes from sister chromosomes.
                        An idea above suggested treating each gene as is if it was on its own separate chromosome. It should make no difference to the program if it thinks there are 10 or 1000000 chromosomes. This might make it easier to create a fake transcript.gtf file, where gene_id's would be the isogroup number, and transcript_id's are the isotig names. Then possibly treating the whole isotig as a CDS region?? With published results showing how Newbler2.5 is such a great assembler, there has to be someone who has assembled novel transcriptome via Newbler and mapped reads through TopHat/Cufflinks? Anyone??? lol

                        Comment


                        • #13
                          I apologize for giving you inaccurate information earlier, but it turns out that my idea for making a pseudo-genome will not work unless your transcriptome is aligned so that you know which transcripts overlap and where. Otherwise, you will need multi-read support, which Cufflinks currently lacks (although it is coming).

                          Short of knowing these overlaps, RSEM is your best option (to my knowledge).

                          Comment


                          • #14
                            Allel specific expression

                            Can we use Cufflink/ Cuffdiff to find the allel specific expression? Any suggestion please.

                            Comment


                            • #15
                              I forgot to mention that our transcriptome is plant. That means polyploid and littered with retrotransposons/transposons. The RSEM paper said 52% of reads in maize were multi-reads. I'm not sure if even RSEM can handle the amount of multi-read alignments a novel plant transcriptome will produce. We are still brainstorming ways to shortcut and streamline this pipeline. Will removing any transcripts & reads that map to retrotransposons affect the statistics of RSEM or CuffLinks? I have been favoring CuffLinks because of better documentation and output. The RSEM google group doesn't have any posts yet. Has anyone used RSEM for novel transcriptome?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Understanding Genetic Influence on Infectious Disease
                                by seqadmin




                                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                                09-09-2024, 10:59 AM
                              • seqadmin
                                Addressing Off-Target Effects in CRISPR Technologies
                                by seqadmin






                                The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                                08-27-2024, 04:44 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 09-11-2024, 02:44 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-06-2024, 08:02 AM
                              0 responses
                              146 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-03-2024, 08:30 AM
                              0 responses
                              153 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 08-27-2024, 04:40 AM
                              0 responses
                              163 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X