Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lucer105
    Member
    • Nov 2013
    • 12

    Am I wrong by doing this?

    Hey all~

    I am not new to bioinformatics....but I am very new in analyzing RNA-seq data set. I work in a pure wet lab, and I am the only part that is partially dry....

    Usually what I do is calculate the gene expression using HTseq and cufflink after Tophat and Bowtie transcript assembling. The unusual part is I never run differentially expressed genes using edgeR or cuffdiff. After the gene expression calculation, I just work on the raw reads, eliminate the genes that get too few reads, and ignore the genes that have a too small change in quantity. Then I will run t test on all the genes that meet the requirement and eliminate the data groups that has a p value from t-test that is bigger than 0.05. Obviously I don't get RFP, but could anyone tell me which way (common way or my way) could lead to more reliable result(s)?

    You might see this is too simple.... even naive.... but could anyone of you tell me is that going to be very wrong by doing what I did?

    Thanks~

    Y.L.
  • swbarnes2
    Senior Member
    • May 2008
    • 910

    #2
    I don't think that's horribly wrong, though it's not as sophisticated an approach as using the software everyone else uses. If all you want to do is to flag genes for for further testing, it might be okay.

    One small problem, when you testing multiple things at the same time, like a few thousand genes, well, see the link below.

    Comment

    • SNPsaurus
      Registered Vendor
      • May 2013
      • 525

      #3
      Let's imagine that none of your genes have actually changed expression. You will eliminate the genes without much of a fold-change, enriching for the false positives that have unlikely distributions of expression. Now you t-test the enriched set, and find some set with P values less than 0.05. Of course, just from the multiple testing you'd expect 5% of the genes to be significant, even though they actually have no difference in expression (just variation that distributes in unlikely ways). The elimination of low-fold changers will just add to the problem, and should likely be counted as a test even if you don't explicitly do so.

      Throwing out genes with few reads is more acceptable, I would think. But your way is going to lead to headaches down the road as you try to make sense of the genes you found.
      Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

      Comment

      • gringer
        David Eccles (gringer)
        • May 2011
        • 845

        #4
        Obviously I don't get RFP, but could anyone tell me which way (common way or my way) could lead to more reliable result(s)?
        You already mentioned it in your description: use edgeR, cuffdiff (or DESeq) to adjust for multiple testing based on the transcript read distribution.

        Comment

        • lucer105
          Member
          • Nov 2013
          • 12

          #5
          Originally posted by SNPsaurus View Post
          Let's imagine that none of your genes have actually changed expression. You will eliminate the genes without much of a fold-change, enriching for the false positives that have unlikely distributions of expression. Now you t-test the enriched set, and find some set with P values less than 0.05. Of course, just from the multiple testing you'd expect 5% of the genes to be significant, even though they actually have no difference in expression (just variation that distributes in unlikely ways). The elimination of low-fold changers will just add to the problem, and should likely be counted as a test even if you don't explicitly do so.
          Correct me if I am wrong. Do you mean my t-test actually enrich the false positive read out? As far as I know for edgeR, the inter-replicate variance is within consideration when calculating the p-value, and I think the t-test also return a high p-value if the variance is big. I think part of the answer would come from how edgeR, cuffdiff and etc. distinguish false positive result, do you know?

          Thanks for your comments, sincerely~

          Y.L

          Comment

          • lucer105
            Member
            • Nov 2013
            • 12

            #6
            Originally posted by gringer View Post
            You already mentioned it in your description: use edgeR, cuffdiff (or DESeq) to adjust for multiple testing based on the transcript read distribution.
            Thanks for the comments, what I will do is to combine my way with differentally expressed gene programs. The reason that I was asking is to make sure that the first small set of data analysis I did is not P.O.S....

            Again, thanks.

            Comment

            • lucer105
              Member
              • Nov 2013
              • 12

              #7
              Originally posted by swbarnes2 View Post
              I don't think that's horribly wrong, though it's not as sophisticated an approach as using the software everyone else uses. If all you want to do is to flag genes for for further testing, it might be okay.

              One small problem, when you testing multiple things at the same time, like a few thousand genes, well, see the link below.

              http://xkcd.com/882/
              Thanks for the comments. That was exactly the purpose----find a few hundred genes well beyond the range of noise. I was thinking by dumping out low read genes and small fold change genes I got more solid result although I sacrifice lots of real read out, but not quite sure whether I was right about this.....cause I have no way to compare my method and common method.

              Y.L.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              19 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              29 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-26-2026, 10:12 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Working...