Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • LeonDK
    Member
    • Sep 2014
    • 69

    Preferred method for differential gene expression analysis?

    Hi all,

    So based on [1] I have implemented edgeR, except using the robust method of estimating the dispersion [2].

    However having previously worked with non-parametric statistical methodology my attention has fallen on [3].

    I am new to the whole NGS/RNASeq world and therefore it would be great to get some input reg. which method you prefer for differential gene expression analysis and why?

    Cheers,
    Leon

    1. http://www.ncbi.nlm.nih.gov/pubmed/23975260
    2. http://www.ncbi.nlm.nih.gov/pubmed/24753412
    3. http://www.ncbi.nlm.nih.gov/pubmed/23981227
  • bastianwur
    Member
    • Feb 2014
    • 98

    #2
    Review: http://www.ncbi.nlm.nih.gov/pubmed/24020486
    Slightly related review: http://www.ncbi.nlm.nih.gov/pubmed/22988256

    Personally: I use cuffdiff, mainly because it works without a hazzle, is fast, and doesn't require much additional effort for the input.
    And I don't know R ^^.

    Comment

    • LeonDK
      Member
      • Sep 2014
      • 69

      #3
      Originally posted by bastianwur View Post
      Review: http://www.ncbi.nlm.nih.gov/pubmed/24020486
      Slightly related review: http://www.ncbi.nlm.nih.gov/pubmed/22988256

      Personally: I use cuffdiff, mainly because it works without a hazzle, is fast, and doesn't require much additional effort for the input.
      And I don't know R ^^.
      Hi bastianwur,

      Thanks for the references, it seems that the consensus is that no method is better than another, each having its unique strengths and weaknesses.

      Which basically means that scientists are likely to choose the one they "like" for whatever reason, ease of use, prior experience etc... Not a very scientific approach imho...

      Cheers,
      Leon

      Comment

      • LeonDK
        Member
        • Sep 2014
        • 69

        #4
        No offence btw. it's just frustrating not being able to get clear cut answer reg. which method is the better

        Cheers,
        Leon

        Comment

        • kopi-o
          Senior Member
          • Feb 2008
          • 319

          #5
          That's because there *is* no clear cut answer

          - If you have a complex design (more than one experimental factor varies), you want to use limma, edgeR or DESeq2. All reviews tend to agree that these are OK. I tend to favor DESeq2.

          - Most reviews agree that CuffDiff is pretty bad.

          - When you have many biological replicates, SAMSeq (a nonparametric method) is a good alternative.

          - Ballgown can do DE analysis on novel transcripts and isoforms (because it's run on an assembly).

          - Single-cell RNA-seq needs special considerations.

          And so on ...

          Comment

          • LeonDK
            Member
            • Sep 2014
            • 69

            #6
            Originally posted by kopi-o View Post
            That's because there *is* no clear cut answer

            - If you have a complex design (more than one experimental factor varies), you want to use limma, edgeR or DESeq2. All reviews tend to agree that these are OK. I tend to favor DESeq2.

            - Most reviews agree that CuffDiff is pretty bad.

            - When you have many biological replicates, SAMSeq (a nonparametric method) is a good alternative.

            - Ballgown can do DE analysis on novel transcripts and isoforms (because it's run on an assembly).

            - Single-cell RNA-seq needs special considerations.

            And so on ...
            I understand and accept that "there *is* no clear cut answer", my acceptance however does not dampen my frustrations

            My setup is the following 96 RNA-seq sample run on Illumina HiSeq2000 using the Illumina TruSeq Stranded mRNA Sample Prep Kit. ~1 case per 4 controls.

            I have done autmated QC'ing using Trim Galore! followed by mapping to UCSC hg19 using TopHat2 and then counted mapped reads using HTSeq. Each raw fastq-file contain ~60-80 mio. reads.

            My current analysis of differentially expressed genes have been performed using edgeR Robust by Zhou et al. (10.1093/nar/gku310) and the workflow described by Anders et al. (doi:10.1038/nprot.2013.099)

            Newbie here... Am I all good or do you have "crucial" input? ...and could you elaborate on the "complex" versus "simple" design?

            Cheers,
            Leon
            Last edited by LeonDK; 09-24-2014, 01:08 AM. Reason: Forgot to include mapper

            Comment

            • bastianwur
              Member
              • Feb 2014
              • 98

              #7
              Originally posted by LeonDK View Post
              No offence btw. it's just frustrating not being able to get clear cut answer reg. which method is the better

              Cheers,
              Leon
              No offence taken, because as kopi-o said: There's no science yet in that part.
              Just use the things which work for you and which give you the results you want (yes, I'm maybe naive, lazy, and a bad scientist, but well...in that case I'm fine with it).

              Workflow seems so far normal, incorporates everything (no idea if trimgalore also does adapter trimming, but if so: good)...wait...besides rRNA filtering.
              That seems to be missing.

              complex design probably means time series and complicated relations between the conditions. No idea what your 96 samples are, but it probably qualifies.

              Comment

              • dpryan
                Devon Ryan
                • Jul 2011
                • 3478

                #8
                Originally posted by bastianwur View Post
                (no idea if trimgalore also does adapter trimming, but if so: good)
                It trims adapters.

                Comment

                • kopi-o
                  Senior Member
                  • Feb 2008
                  • 319

                  #9
                  Newbie here... Am I all good or do you have "crucial" input? ...and could you elaborate on the "complex" versus "simple" design?
                  I think your approach seems sound.

                  By "complex" design I mean that there is more than one experimental factor that varies. For example, let's say you are looking at RNA-seq of tumor and paired normal tissue samples in several individuals. You *could* just compare the tumor vs normal groups, but you would get more statistical power by also considering which patient each sample is from - in other words, you'd model two factors, "individual" and "tumor/normal". This particular case would be a paired design ("paired" because you have "paired" tumor and normal samples from the same patient). This can be done in edgeR, limma, DESeq2, and in fact SAMSeq as well. If you do not use a paired analysis, natural variation between individuals can easily overwhelm the specific signal from the tumor vs tissue differences.

                  A more complex design could be a case where you have tumor cultures that have been treated with 3 different drugs at 3 different time points, with matched normals. Here you would perhaps want to model three different factors. It's this kind of scenario that you really need edgeR/DESeq2/limma for.

                  Comment

                  • LeonDK
                    Member
                    • Sep 2014
                    • 69

                    #10
                    Originally posted by kopi-o View Post
                    I think your approach seems sound.

                    By "complex" design I mean that there is more than one experimental factor that varies. For example, let's say you are looking at RNA-seq of tumor and paired normal tissue samples in several individuals. You *could* just compare the tumor vs normal groups, but you would get more statistical power by also considering which patient each sample is from - in other words, you'd model two factors, "individual" and "tumor/normal". This particular case would be a paired design ("paired" because you have "paired" tumor and normal samples from the same patient). This can be done in edgeR, limma, DESeq2, and in fact SAMSeq as well. If you do not use a paired analysis, natural variation between individuals can easily overwhelm the specific signal from the tumor vs tissue differences.

                    A more complex design could be a case where you have tumor cultures that have been treated with 3 different drugs at 3 different time points, with matched normals. Here you would perhaps want to model three different factors. It's this kind of scenario that you really need edgeR/DESeq2/limma for.
                    Super description - Appreciate it!

                    My setup is RNA-seq data from two groups: Group A from sick individuals and group B from healthy individuals and then I want to profile any gene expression differences

                    Cheers,
                    Leon

                    Comment

                    • dpryan
                      Devon Ryan
                      • Jul 2011
                      • 3478

                      #11
                      Keep in mind that batch effects (they're inevitable) are easily handled by more complex designs as well. You also might benefit by stratifying patients & controls by gender or ethnicity or age or ... . An initially simple design can quickly become more complicated if someone forgot to take care of a factor during sample acquisition.

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM
                      • SEQadmin2
                        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                        by SEQadmin2

                        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                        05-06-2026, 09:04 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Today, 08:59 AM
                      0 responses
                      10 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      21 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 11:40 AM
                      0 responses
                      17 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-28-2026, 11:40 AM
                      0 responses
                      31 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...