Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • syintel87
    Member
    • Dec 2012
    • 81

    DEG analysis without gff/gtf file

    My goal is to see differentially expressed genes across different time points.
    However, I want to map allreads based solely on sequence and not on where they map to, because it is not certain whether my annotation of ghe genome is correct or complete. So I do not want to use an annotation.

    In this case, after running tophat without "-g option",
    what approaches could be used in the next step othar than HTSeq or cufflinks/cuffdiff?

    I have been told that cufflink/cuffdiff is not so powerful to see DEG, and have been advised to use HTSeq/EdgeR/DESeq. However, HTSeq requires GFF as an input file. So I need to take another approach. Would you please give me tips about what other programs could be used in my case?

    Thanks in advance.
    Last edited by syintel87; 01-07-2013, 09:06 AM.
  • bernardo_bello
    Member
    • May 2012
    • 49

    #2
    Hi syintel87,

    I have been recently looking for a pipeline for RNA-Seq analysis and had the same doubt as you. As far as I know, in all cases (whether de novo assembly or reference-based mapping) you are going to need a GFF3/GTF file.

    Bernardo

    Comment

    • syintel87
      Member
      • Dec 2012
      • 81

      #3
      how to get DEG without gtf/gff?

      Is there a way to achieve my goal which is to see differentially expressed genes across different time points, without gff/gtf file?

      If I use the annotated file, reads will only map to annotated reads. This will exclude any reads that map to genes that have yet to be annotated.

      Comment

      • bernardo_bello
        Member
        • May 2012
        • 49

        #4
        Originally posted by syintel87 View Post
        Is there a way to achieve my goal which is to see differentially expressed genes across different time points, without gff/gtf file?

        If I use the annotated file, reads will only map to annotated reads. This will exclude any reads that map to genes that have yet to be annotated.
        Well, at some point programs like rQuant, rDiff, DESeq or Cuffdiff are going to need a file with transcripts in order to quantify them in the *.bam files.

        Maybe there are other tools GFT/GFF3-independent that I still don't know.


        Bernardo

        Comment

        • adumitri
          Member
          • Jan 2010
          • 27

          #5
          Even if you do not use Cuffdiff for the DE analysis, you can run Cufflinks on your samples to get sample-specific .gtf files. These annotations (which can contain novel transcripts/genes) can be merged afterwards with a reference .gtf file that you prefer (e.g. Ensembl's) using Cuffmerge, and you can use the resulting merged .gtf file for the DESeq/edgeR analyses.

          Comment

          • syintel87
            Member
            • Dec 2012
            • 81

            #6
            Originally posted by adumitri View Post
            Even if you do not use Cuffdiff for the DE analysis, you can run Cufflinks on your samples to get sample-specific .gtf files. These annotations (which can contain novel transcripts/genes) can be merged afterwards with a reference .gtf file that you prefer (e.g. Ensembl's) using Cuffmerge, and you can use the resulting merged .gtf file for the DESeq/edgeR analyses.
            Oh!!! How helpful it is!!!
            Thank you so much!!!!!!!!!
            That GFF file is what I exactly want to have!!!

            Comment

            • syfo
              Just a member
              • Nov 2012
              • 103

              #7
              Originally posted by syintel87 View Post
              Is there a way to achieve my goal which is to see differentially expressed genes across different time points, without gff/gtf file?
              Well... it may sound silly but to identify *differentially expressed* genes you need to identify *genes*.
              Either you provide them as known data in the form of an annotation file (GTF/GFF/BED/etc) or you'll have to infer them from the reads, which is a very challenging task if you expect complete gene models. You typically get differentially expressed "genomic regions" -aka "transcribed fragments" (transfrags), "transcriptionally active regions" (TAR), etc and not complete "genes".

              As adumitri indicated you can use cufflinks (or BEDtools) to extract those transcribed regions from the mapped reads and merge them with some reference annotation so that you can probe known and unknown regions.
              I would just recommend to merge the reads from all the samples altogether -along with the reference annotation- so that the statistical method you choose next will consider the exact same set of regions across samples/conditions. You should then find differentially expressed regions. Now defining if two transcribed regions belong to the same gene/transcript is another question.

              Comment

              Latest Articles

              Collapse

              • GATTACAT
                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by GATTACAT
                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                07-01-2026, 11:43 AM
              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 07-02-2026, 11:08 AM
              0 responses
              16 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-30-2026, 05:37 AM
              0 responses
              17 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              20 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              54 views
              0 reactions
              Last Post SEQadmin2  
              Working...