Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Am I wrong by doing this?

    Hey all~

    I am not new to bioinformatics....but I am very new in analyzing RNA-seq data set. I work in a pure wet lab, and I am the only part that is partially dry....

    Usually what I do is calculate the gene expression using HTseq and cufflink after Tophat and Bowtie transcript assembling. The unusual part is I never run differentially expressed genes using edgeR or cuffdiff. After the gene expression calculation, I just work on the raw reads, eliminate the genes that get too few reads, and ignore the genes that have a too small change in quantity. Then I will run t test on all the genes that meet the requirement and eliminate the data groups that has a p value from t-test that is bigger than 0.05. Obviously I don't get RFP, but could anyone tell me which way (common way or my way) could lead to more reliable result(s)?

    You might see this is too simple.... even naive.... but could anyone of you tell me is that going to be very wrong by doing what I did?

    Thanks~

    Y.L.

  • #2
    I don't think that's horribly wrong, though it's not as sophisticated an approach as using the software everyone else uses. If all you want to do is to flag genes for for further testing, it might be okay.

    One small problem, when you testing multiple things at the same time, like a few thousand genes, well, see the link below.

    Comment


    • #3
      Let's imagine that none of your genes have actually changed expression. You will eliminate the genes without much of a fold-change, enriching for the false positives that have unlikely distributions of expression. Now you t-test the enriched set, and find some set with P values less than 0.05. Of course, just from the multiple testing you'd expect 5% of the genes to be significant, even though they actually have no difference in expression (just variation that distributes in unlikely ways). The elimination of low-fold changers will just add to the problem, and should likely be counted as a test even if you don't explicitly do so.

      Throwing out genes with few reads is more acceptable, I would think. But your way is going to lead to headaches down the road as you try to make sense of the genes you found.
      Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

      Comment


      • #4
        Obviously I don't get RFP, but could anyone tell me which way (common way or my way) could lead to more reliable result(s)?
        You already mentioned it in your description: use edgeR, cuffdiff (or DESeq) to adjust for multiple testing based on the transcript read distribution.

        Comment


        • #5
          Originally posted by SNPsaurus View Post
          Let's imagine that none of your genes have actually changed expression. You will eliminate the genes without much of a fold-change, enriching for the false positives that have unlikely distributions of expression. Now you t-test the enriched set, and find some set with P values less than 0.05. Of course, just from the multiple testing you'd expect 5% of the genes to be significant, even though they actually have no difference in expression (just variation that distributes in unlikely ways). The elimination of low-fold changers will just add to the problem, and should likely be counted as a test even if you don't explicitly do so.
          Correct me if I am wrong. Do you mean my t-test actually enrich the false positive read out? As far as I know for edgeR, the inter-replicate variance is within consideration when calculating the p-value, and I think the t-test also return a high p-value if the variance is big. I think part of the answer would come from how edgeR, cuffdiff and etc. distinguish false positive result, do you know?

          Thanks for your comments, sincerely~

          Y.L

          Comment


          • #6
            Originally posted by gringer View Post
            You already mentioned it in your description: use edgeR, cuffdiff (or DESeq) to adjust for multiple testing based on the transcript read distribution.
            Thanks for the comments, what I will do is to combine my way with differentally expressed gene programs. The reason that I was asking is to make sure that the first small set of data analysis I did is not P.O.S....

            Again, thanks.

            Comment


            • #7
              Originally posted by swbarnes2 View Post
              I don't think that's horribly wrong, though it's not as sophisticated an approach as using the software everyone else uses. If all you want to do is to flag genes for for further testing, it might be okay.

              One small problem, when you testing multiple things at the same time, like a few thousand genes, well, see the link below.

              http://xkcd.com/882/
              Thanks for the comments. That was exactly the purpose----find a few hundred genes well beyond the range of noise. I was thinking by dumping out low read genes and small fold change genes I got more solid result although I sacrifice lots of real read out, but not quite sure whether I was right about this.....cause I have no way to compare my method and common method.

              Y.L.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-27-2024, 06:37 PM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-27-2024, 06:07 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              69 views
              0 likes
              Last Post seqadmin  
              Working...
              X