Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • wbb1813
    Member
    • May 2014
    • 12

    How to deal with normalized RNA seq data?

    Hi everyone !
    I have got the TCGA data about prostatic cancer,the data has been normalized.Now I want to analysis it and find some different expressed genes.
    my question is what method can I use except limma and t.test?
    Thank you !
  • mbblack
    Senior Member
    • Aug 2009
    • 245

    #2
    Starting with normalized data, you can use LIMMA, T-Tests or ANOVA. Of course, without biological replicates, you cannot do any statistical analysis of differential gene expression at all, so I'm assuming you have at least 2 biological replicates for each condition (and really, 3 should be the bare minimum acceptable for any reliable stats).
    Michael Black, Ph.D.
    ScitoVation LLC. RTP, N.C.

    Comment

    • wbb1813
      Member
      • May 2014
      • 12

      #3
      Thank you for your reply!
      The data which I got is 180 samples ,contain 141 tumor samples and 39 normal samples .I have analysis the data with LIMMA and T-Test,when I use LIMMA I choose 3 tumor and 3 normal to analysis ,I find about 80 different expressed genes(diffgenes),then I choos 10 tumor and 10 normal,I get about 2000 diffgenes ,finally I use all the tumor and normal samples ,unintelligibly,I get 10,000 diffgenes.
      I don't know why this phenomenon happen ?
      Should I do some cluster or PCA before analysis (Actually,I have done these method ,but cluster and PCA didn't work well,I can't cluster all the 180 sample perfectly),is there any other method to do this work?
      I'm a junior ,thank you for your help !

      Comment

      • mbblack
        Senior Member
        • Aug 2009
        • 245

        #4
        Any time that you increase the number of replicates, you will likely detect more significant results. That is the whole point of replication. The more replicates, the more precise is your estimate of the population mean and variance, and hence the smaller the change you can now detect as significant (i.e. unlikely to be observed by chance). Statistical significance is all about your ability to estimate population mean and variance, and the more replicates you have, the better your estimates of those parameters.

        This is the very reason why biological replicates are so important in detecting differential gene expression. The more replicates, the greater your ability to detect ever more subtle changes in gene expression.

        Unless you have some rational reason to reject a sample, you should always use all of your biological replicates when testing for differential gene expression - you have the most power to discriminate differences then.

        When selecting differentially expressed genes though, you should NOT rely purely on statistical significance. Many published studies have clearly shown that you will get your most reliable results if you simultaneously use both a statistical threshold (e.g. FDR<0.05 is common), AND a magnitude threshold (e.g. absolute value of estimated fold change >1.5, or >2.0 are common cutoffs).

        Genes selected by simultaneously applying a statistical, AND a magnitude threshold are the most likely to validate via an independent method such as RT-qPCR.
        Michael Black, Ph.D.
        ScitoVation LLC. RTP, N.C.

        Comment

        • dpryan
          Devon Ryan
          • Jul 2011
          • 3478

          #5
          The more samples you use, the more power you have. That's a completely expected result. With enough samples you'd likely find almost everything to be at least slightly different between the two samples.

          "PCA didn't work well" is meaningless. Perhaps the samples clustered by group or perhaps not, but in neither case can one say that the PCA itself didn't work well.

          Comment

          • wbb1813
            Member
            • May 2014
            • 12

            #6
            Thanks for all!I think I have understand the important of biological replicates ,and I will use some magnitude threshold,or some biological methods to find the genes which I interested in.
            On the other hand,I want to get the raw counts about Prostate cancer from TCGA,then use DEseq and edgeR to find differential expression genes,I don't know whether I can get the raw data from TCGA.
            I get a new question ,my 180 sample from TCGA is from different batch ,is there a impact to our analysis ?

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              Yes, batch effects can be quite large on occasion (or quite small, you never know). Have a look at the SVA package on bioconductor.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 05:37 AM
              0 responses
              6 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              16 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              51 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              110 views
              0 reactions
              Last Post SEQadmin2  
              Working...