Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Preferred method for differential gene expression analysis?

    Hi all,

    So based on [1] I have implemented edgeR, except using the robust method of estimating the dispersion [2].

    However having previously worked with non-parametric statistical methodology my attention has fallen on [3].

    I am new to the whole NGS/RNASeq world and therefore it would be great to get some input reg. which method you prefer for differential gene expression analysis and why?

    Cheers,
    Leon

    1. http://www.ncbi.nlm.nih.gov/pubmed/23975260
    2. http://www.ncbi.nlm.nih.gov/pubmed/24753412
    3. http://www.ncbi.nlm.nih.gov/pubmed/23981227

  • #2
    Review: http://www.ncbi.nlm.nih.gov/pubmed/24020486
    Slightly related review: http://www.ncbi.nlm.nih.gov/pubmed/22988256

    Personally: I use cuffdiff, mainly because it works without a hazzle, is fast, and doesn't require much additional effort for the input.
    And I don't know R ^^.

    Comment


    • #3
      Originally posted by bastianwur View Post
      Review: http://www.ncbi.nlm.nih.gov/pubmed/24020486
      Slightly related review: http://www.ncbi.nlm.nih.gov/pubmed/22988256

      Personally: I use cuffdiff, mainly because it works without a hazzle, is fast, and doesn't require much additional effort for the input.
      And I don't know R ^^.
      Hi bastianwur,

      Thanks for the references, it seems that the consensus is that no method is better than another, each having its unique strengths and weaknesses.

      Which basically means that scientists are likely to choose the one they "like" for whatever reason, ease of use, prior experience etc... Not a very scientific approach imho...

      Cheers,
      Leon

      Comment


      • #4
        No offence btw. it's just frustrating not being able to get clear cut answer reg. which method is the better

        Cheers,
        Leon

        Comment


        • #5
          That's because there *is* no clear cut answer

          - If you have a complex design (more than one experimental factor varies), you want to use limma, edgeR or DESeq2. All reviews tend to agree that these are OK. I tend to favor DESeq2.

          - Most reviews agree that CuffDiff is pretty bad.

          - When you have many biological replicates, SAMSeq (a nonparametric method) is a good alternative.

          - Ballgown can do DE analysis on novel transcripts and isoforms (because it's run on an assembly).

          - Single-cell RNA-seq needs special considerations.

          And so on ...

          Comment


          • #6
            Originally posted by kopi-o View Post
            That's because there *is* no clear cut answer

            - If you have a complex design (more than one experimental factor varies), you want to use limma, edgeR or DESeq2. All reviews tend to agree that these are OK. I tend to favor DESeq2.

            - Most reviews agree that CuffDiff is pretty bad.

            - When you have many biological replicates, SAMSeq (a nonparametric method) is a good alternative.

            - Ballgown can do DE analysis on novel transcripts and isoforms (because it's run on an assembly).

            - Single-cell RNA-seq needs special considerations.

            And so on ...
            I understand and accept that "there *is* no clear cut answer", my acceptance however does not dampen my frustrations

            My setup is the following 96 RNA-seq sample run on Illumina HiSeq2000 using the Illumina TruSeq Stranded mRNA Sample Prep Kit. ~1 case per 4 controls.

            I have done autmated QC'ing using Trim Galore! followed by mapping to UCSC hg19 using TopHat2 and then counted mapped reads using HTSeq. Each raw fastq-file contain ~60-80 mio. reads.

            My current analysis of differentially expressed genes have been performed using edgeR Robust by Zhou et al. (10.1093/nar/gku310) and the workflow described by Anders et al. (doi:10.1038/nprot.2013.099)

            Newbie here... Am I all good or do you have "crucial" input? ...and could you elaborate on the "complex" versus "simple" design?

            Cheers,
            Leon
            Last edited by LeonDK; 09-24-2014, 01:08 AM. Reason: Forgot to include mapper

            Comment


            • #7
              Originally posted by LeonDK View Post
              No offence btw. it's just frustrating not being able to get clear cut answer reg. which method is the better

              Cheers,
              Leon
              No offence taken, because as kopi-o said: There's no science yet in that part.
              Just use the things which work for you and which give you the results you want (yes, I'm maybe naive, lazy, and a bad scientist, but well...in that case I'm fine with it).

              Workflow seems so far normal, incorporates everything (no idea if trimgalore also does adapter trimming, but if so: good)...wait...besides rRNA filtering.
              That seems to be missing.

              complex design probably means time series and complicated relations between the conditions. No idea what your 96 samples are, but it probably qualifies.

              Comment


              • #8
                Originally posted by bastianwur View Post
                (no idea if trimgalore also does adapter trimming, but if so: good)
                It trims adapters.

                Comment


                • #9
                  Newbie here... Am I all good or do you have "crucial" input? ...and could you elaborate on the "complex" versus "simple" design?
                  I think your approach seems sound.

                  By "complex" design I mean that there is more than one experimental factor that varies. For example, let's say you are looking at RNA-seq of tumor and paired normal tissue samples in several individuals. You *could* just compare the tumor vs normal groups, but you would get more statistical power by also considering which patient each sample is from - in other words, you'd model two factors, "individual" and "tumor/normal". This particular case would be a paired design ("paired" because you have "paired" tumor and normal samples from the same patient). This can be done in edgeR, limma, DESeq2, and in fact SAMSeq as well. If you do not use a paired analysis, natural variation between individuals can easily overwhelm the specific signal from the tumor vs tissue differences.

                  A more complex design could be a case where you have tumor cultures that have been treated with 3 different drugs at 3 different time points, with matched normals. Here you would perhaps want to model three different factors. It's this kind of scenario that you really need edgeR/DESeq2/limma for.

                  Comment


                  • #10
                    Originally posted by kopi-o View Post
                    I think your approach seems sound.

                    By "complex" design I mean that there is more than one experimental factor that varies. For example, let's say you are looking at RNA-seq of tumor and paired normal tissue samples in several individuals. You *could* just compare the tumor vs normal groups, but you would get more statistical power by also considering which patient each sample is from - in other words, you'd model two factors, "individual" and "tumor/normal". This particular case would be a paired design ("paired" because you have "paired" tumor and normal samples from the same patient). This can be done in edgeR, limma, DESeq2, and in fact SAMSeq as well. If you do not use a paired analysis, natural variation between individuals can easily overwhelm the specific signal from the tumor vs tissue differences.

                    A more complex design could be a case where you have tumor cultures that have been treated with 3 different drugs at 3 different time points, with matched normals. Here you would perhaps want to model three different factors. It's this kind of scenario that you really need edgeR/DESeq2/limma for.
                    Super description - Appreciate it!

                    My setup is RNA-seq data from two groups: Group A from sick individuals and group B from healthy individuals and then I want to profile any gene expression differences

                    Cheers,
                    Leon

                    Comment


                    • #11
                      Keep in mind that batch effects (they're inevitable) are easily handled by more complex designs as well. You also might benefit by stratifying patients & controls by gender or ethnicity or age or ... . An initially simple design can quickly become more complicated if someone forgot to take care of a factor during sample acquisition.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Genetic Variation in Immunogenetics and Antibody Diversity
                        by seqadmin



                        The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                        11-06-2024, 07:24 PM
                      • seqadmin
                        Choosing Between NGS and qPCR
                        by seqadmin



                        Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                        10-18-2024, 07:11 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 11-01-2024, 06:09 AM
                      0 responses
                      29 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-30-2024, 05:31 AM
                      0 responses
                      21 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-24-2024, 06:58 AM
                      0 responses
                      26 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-23-2024, 08:43 AM
                      0 responses
                      57 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X