Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufflinks -- new transcripts detection

    I have run tophat2 to align my paired-end rna-seq data to a reference genome:
    /usr/bin/tophat2 -G ASgenome.gff3 -r 300 asgenome2 seq1.fa seq2.fa

    and got results as ./tophat_out/accepted_hits.bam


    Now I am reading the manual of cufflinks and have some questions about the option parameters:

    1. what is the difference if I use -g ASgenome.gff3 or skip it? That is to say, will it be different between running

    cufflinks -p8 -o clout ./tophat_out/accepted_hits.bam

    and

    cufflinks -p8 -g ASgenome.gff3 -o clout ./tophat_out/accepted_hits.bam


    My purpose is to detect the new transcripts

    2. cuffcompare results: class_code e

    Should I remove those from final assembled geneset?

    Any hints are welcome. Thanks a lot!

  • #2
    1. Yes they will be different. When you do not use -g the assembly is completely blind to the transcriptome information. When you supply transcriptome information with -g flag, it will be used to help guide the assembly. More details here.

    2. It depends on what are you looking for. If you are focusing on novel isoforms you might want to keep them - they can originate from pre-mRNA (which is more likely), but they can also originate from mRNA showing different exon length than the reference. Keep in mind that this only applies if you used polyA selection in library preparation, while if you used ribosomal depletion, you should definitely ignore them (unless investigating pre-mRNA itself of course).

    Comment


    • #3
      Thank you very much for your reply. I did redid cufflinks with -g.

      Then I did cuffmerge and cuffcompare.


      Further question about cuffcompare using the following command line:
      cuffcompare -o ./merged_asm -i gtf_out_list.genome.2.txt -r ASgenome.gff3

      gtf_out_list.genome.2.txt contains the information about the path to the merged .gtf


      A bunch of files were generated in this step. I expected that .loci file gave the number of gene loci assembled, and this number should be the same as the number of genes derived from cuffmerge (genes.fpkm_tracking). However, the results showed that they are quite different. So which one should I go with? Could anyone explain why they were different?

      Thanks for the help!

      Comment


      • #4
        I am also confused about those .gtf files from cuffmerge (transcripts.gtf, merged.gtf) and cuffcompare(cuffcmp.combined.gtf) steps.

        When I did cuffcompare, my input file (-i gtf_out_list.genome.2.txt) only has one file which is the transcripts.gtf from the previous cuffmerge. During the cuffmerge step, skipped.gtf is empty.

        What I observe now is that they have different number of records.

        Hope I can get some help here. Thanks a lot for your time!

        Comment


        • #5
          When you run cufflinks with -g, it would conduct a RABT (reference annotation based transcript) assembly, and is helpful in discovering novel transcripts and isoforms.
          But, I am confused about the cuffmerge and cuffcompare files, too. So, if you have any new opinion about these files, please tell me, thanks!

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-25-2024, 11:49 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          62 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X