Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    RNA-Seq differential gene results seem to be much more variable amongst normalization techniques, in my experience, then microarray data is. Or, put another way, microarray data shows more DE genes in common across normalizations then RNA-Seq does even when the actual statistical test is identical for all. While I cannot claim to have exhaustively tried every possible normalization available for either technology, for the ones I have (with data from the same samples), that seems to hold true. That may just be a matter of time - microarray analysis being at a more mature state than sequence data analysis.

    Whether one needs qPCR results is really dependent on the study in my mind. We rarely do so in our toxicology work, since we are primarily interested in characterizing genomic changes and using them to infer possible functional biology differences. Since we are focused primarily on the functional changes based on multiple gene effects, confirming a particular gene is not of concern to us.

    However, in those instances where we may be focused on a single specific functional pathway, then it might be deemed necessary to confirm the (relatively smaller set of) genes associated with that particular functional network.

    But the study design pretty much drives that sort of decision.
    Michael Black, Ph.D.
    ScitoVation LLC. RTP, N.C.

    Comment


    • #17
      I think verification is always necessary..why wouldn't you do it? Your story can only be stronger if you can show the result in different ways. My background with this type of data only goes back to 2009 but that's enough to have seen the technology and analysis come a long way for expression and differential expression. It is much better now and continues to improve but this tech is still in its infancy. There is no test that has not been shown to have false positives and false negatives. If you're after very small fold changes then its pretty likely that unless you have 20 replicates if each condition no DE test will capture those changes. Sometimes you can see genes that seem like they could be DE but for whatever reason the estimated variance causes it to be excluded. That's another reason...there's still no other area where a t-like test is going to be believable with an N of 3...try 10 to 20. The models used for gene expression variation are good but are mostly selected out of convenience. In truth the variation of every gene may be unique and certainly the importance of the amount of change of a gene is most certainly unique to many. We know the math for DE cannot know these things so we must look closer than simply being happy with a DE list from a 3 vs 3 test. At least be aware that its likely there are lots of genes left out of that result and maybe some that have been reported that should not be.

      Maybe my opinion stems from the type of projects I've worked on. The lab I work with pushes for Cell/Nature level publication and maybe it's my PI or just the type of people he hires but when we see something interesting in RNA seq results they design primers and verify it. It's just thorough, to me. The same goes for mutation studies and splicing studies. I'm pretty far from believing in the isoforms or differential splicing information pretty much any software gives me. Those things need to be verified.

      I suppose I'm also more familiar with people doing non standard things with RNA seq which, at this point, is still almost anything other than differential expression. My instinct is to be cautious and to verify results.

      Illumina has created an entire new assay technology for candidate list confirmation. It's clearly of great enough concern that initial RNA seq studies should be verified if they believe there's profit to be made.
      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
      Salk Institute for Biological Studies, La Jolla, CA, USA */

      Comment


      • #18
        Originally posted by sdriscoll View Post
        I think verification is always necessary..why wouldn't you do it?
        Simple, cost and throughput limitations. In our case, a single study may end up using hundreds to thousands of genes selected as differentially expressed by statistical and fold change thresholds. And we always run at least 5 biological replicates, but may have 5,10 or more doses, and or multiple time points (not to mention often multiple chemicals or drugs). Given the initial cost of the studies, the time involved in rearing animals, necropsy, and data generation and analyses, there is rarely time nor money to validate all DE genes selected from a given microarray or sequence study.

        And, again in terms of the focus of the study, doing so actually would be highly unlikely to add much if anything of value. When in a publication you are talking about multiple functional ontologies being enriched, and where enrichment is estimated based on many (often hundreds) of DE genes, it is understood that no matter what or how many tests you perform, some of those genes will always be false positives. However, the idea that all or most of them will be false positives is highly unlikely, since we selected them based on FDR and Fold Change, in which case the functional interpretation based on DE gene analysis will not change (a pathway enriched with say, 10-15 genes is still likely to be considered statistically enriched if you drop out one, two or even three of those genes).

        The only time we would be concerned about qPCR validation is if the underlying biological interpretation was dependent on individual specific genes, or if the study began with a per-determined focus on a small sub-set of candidate genes of interest. If there is to be followup studies that do wish to focus on specific genes, then obviously, validation of those specific genes would be one of the first priorities to do. But until or unless there is cause to focus on specific genes or much smaller subsets of genes, performing validation would simple burn up money and time resources with little if any benefit.
        Last edited by mbblack; 05-31-2013, 07:48 AM.
        Michael Black, Ph.D.
        ScitoVation LLC. RTP, N.C.

        Comment


        • #19
          then i will rephrase - depending on the experiment, validation is sometimes very important. mbblack you might be interested in this new tech from illumina - they say they'll be read to start taking orders within a month. apparantly they can provide qPCR level validation for 1000's of targets with 10's to 100's of replicates. the hands on time is only 4.5 hours, analysis is so straightforward it's done on their instrument and the run-time is only about 1.5 days. i just learned about it yesterday at a small presentation here at Salk. they have 1000's of pre-arranged assays and you can create custom ones. they can also verify alternative splicing by applying this targeted sequencing technology specifically at the exon junctions in question. It looks pretty good...and it's a way for people to run a verification on very long candidate lists. If it catches on it seems likely that people will start asking why you DIDN'T do it.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment


          • #20
            Thanks, I will take a look into that. We do not currently use Illumina technology (we have an ABI system for sequence, and Affy and Agilent for microarrays, using mostly Affy titan arrays). We usually use LifeTechnologies Taqman assays for validation.

            From our perspective, the issue always will still come down to money, time and effort. When dealing with private corporate and government contracts, the question still will come up first and foremost: "does doing this tell us anything we don't already know about the toxicology of compound X". If it is not clearly adding to or enhancing our understanding of the toxicology in question, they will want to spend their money on something else that will do so. And as I mentioned, for a lot of our kind of work, gene validation does not really add anything to the biological interpretation.

            I agree with what you said about "depending on the study". Validation may well be a crucial component, but it is just not something that always needs to be done nor should be done (if it does little more than waste resources). The nature of the study should form the basis for that decision - if the scientific question(s) being asked require it, then so be it. But if they don't then it is pointless to spend the money and time on it.

            P.S. for what it is worth the several DGE papers we've published in the past couple of years - the issue of validation has never once come up by any reviewer nor editor as the studies, interpretation and presentation simply did not call for it.
            Last edited by mbblack; 05-31-2013, 10:17 AM.
            Michael Black, Ph.D.
            ScitoVation LLC. RTP, N.C.

            Comment


            • #21
              Interesting - I wonder if the difference in our opinion has to do with our industries. For example I'm in an academic lab and the projects I work on lead toward publication which, in turn, has a significant impact on the chances these post-docs have at getting faculty positions. These days people with 3 Cell papers are having trouble finding jobs. The typical time duration for the type of publications we put out is more than 3 years (mostly because of the time it takes to establish the mouse lines). At that point a verification step can be very reassuring and is a relatively small thing compared to the overall project. When their staking their careers on years worth of work - it's pretty important they get it right.

              FYI here's the new kit illumina is making which doesn't appear to be commercially available yet. it runs on their MiSeq systems.

              Targeted RNA-Seq enables researchers to sequence specific transcripts of interest, and provides both quantitative and qualitative information.
              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
              Salk Institute for Biological Studies, La Jolla, CA, USA */

              Comment


              • #22
                Here's a couple nice examples relating to this discussion. First an evaluation of normalization methods and second is an evaluation of DE performance verses sequencing depth and replicate count.


                Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.


                the most fantastic thing about this second paper is their demonstration of true positive rates. < 40% with 12 replicates. not too inspiring.
                /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                Salk Institute for Biological Studies, La Jolla, CA, USA */

                Comment


                • #23
                  Well, pretty much everything we do is intended for publication, and we have the same issues in terms of time lines. Many of our studies are 3,4, or even 5 years from start to finish and we have several large rat colonies on the go right now. We also always have numerous studies going on simultaneoulsy so have to manage our time on any one pretty efficiently or we risk missing deliverable deadlines. I've worked here a bit less than three years and have been on 7 publications out now, a few currently in press, and several to be submitted by years end. We certainly publish, and publish a lot and regularly. At least within toxicology, many of our senior scientists are globally recognized as leaders in their fields, so reputation certainly counts here too.

                  Academics may well offer more freedom in how you spend your money. We will have more restrictions on that based on the contract. Also, we have very limited freedom to go beyond what was committed to at the beginning of a study as that would cut into other ongoing or future commitments.

                  But, this company also has a 36 year history of annually producing a large number of very highly regarded toxicology publications. So we can hardly be viewed as skimping on the quality of our science, especially since much of what we do has implications for public health and safety. However, we make our decisions based on the scientific need for particular data and analyses. And most large scale toxicogenomic characterization studies using whole genome differential gene expression (array or sequence) rarely publish gene validation data or bother to even collect it. Again, the question or study design will dictate if that data is necessary, adds to or aids in understanding the science and unless it does, we don't do it. So it is a deliberate decision to include or exclude validation, based on the scientific need for it, or not, as the case may be.

                  P.S. The Hamner Institutes is a non-profit research corporation. Up until 2007, it was the CIIT (Chemical Industry Institue for Toxicology), funded by the American Chemical Council but operated as an independent research center. In 2007, they re-incorporated to become a fully private corporation. Prior to coming here though, I spent 10 years providing bioinformatics support at the University of Virginia's School of Medicine.
                  Last edited by mbblack; 05-31-2013, 12:00 PM.
                  Michael Black, Ph.D.
                  ScitoVation LLC. RTP, N.C.

                  Comment


                  • #24
                    Originally posted by mbblack View Post
                    Differential gene expression analysis is (usually) a population level question. If you don't adequately sample the population, you cannot address the question of what genes are differentially expressed amongst the population(s) under study.
                    This is a great thread, I've been thinking about this issue for a while. I agree completely that biological replicates are the better way to go if it's feasible.

                    However, what about experiments in which RNAs from two populations are compared, in order to look for differentially expressed genes, but multiple individuals from each population (treated vs. untreated, mutant vs. wild-type, etc.) must be pooled to get enough biological material to do the experiment?

                    Could that be considered "adequately sampling the two populations"? If one is extremely stringent with statistical analysis, wouldn't the problem of variation of gene expression WITHIN one population be contained and overcome in that analysis?

                    I'm curious as to what people think about this, since pooling individuals (flies, plants, you name it) is a very common practice.

                    Comment


                    • #25
                      Originally posted by mbblack View Post
                      Well, pretty much everything we do is intended for publication, and we have the same issues in terms of time lines. Many of our studies are 3,4, or even 5 years from start to finish and we have several large rat colonies on the go right now. We also always have numerous studies going on simultaneoulsy so have to manage our time on any one pretty efficiently or we risk missing deliverable deadlines. I've worked here a bit less than three years and have been on 7 publications out now, a few currently in press, and several to be submitted by years end. We certainly publish, and publish a lot and regularly. At least within toxicology, many of our senior scientists are globally recognized as leaders in their fields, so reputation certainly counts here too.

                      Academics may well offer more freedom in how you spend your money. We will have more restrictions on that based on the contract. Also, we have very limited freedom to go beyond what was committed to at the beginning of a study as that would cut into other ongoing or future commitments.

                      But, this company also has a 36 year history of annually producing a large number of very highly regarded toxicology publications. So we can hardly be viewed as skimping on the quality of our science, especially since much of what we do has implications for public health and safety. However, we make our decisions based on the scientific need for particular data and analyses. And most large scale toxicogenomic characterization studies using whole genome differential gene expression (array or sequence) rarely publish gene validation data or bother to even collect it. Again, the question or study design will dictate if that data is necessary, adds to or aids in understanding the science and unless it does, we don't do it. So it is a deliberate decision to include or exclude validation, based on the scientific need for it, or not, as the case may be.

                      P.S. The Hamner Institutes is a non-profit research corporation. Up until 2007, it was the CIIT (Chemical Industry Institue for Toxicology), funded by the American Chemical Council but operated as an independent research center. In 2007, they re-incorporated to become a fully private corporation. Prior to coming here though, I spent 10 years providing bioinformatics support at the University of Virginia's School of Medicine.
                      I agree with you - but I'll bet you'd be hard pressed to find someone knocking someone for doing a verification. While it may be commonplace to skip it I'll maintain the opinion that depending on the project it can be very important - it will take nothing away except for some extra time/money. In our projects this is evaluated based on how the story sounds and how important the seq data is overall to the story. If the seq data is and results have become a significant anchor then we'll perform verification.
                      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                      Salk Institute for Biological Studies, La Jolla, CA, USA */

                      Comment


                      • #26
                        Originally posted by kerhard View Post
                        This is a great thread, I've been thinking about this issue for a while. I agree completely that biological replicates are the better way to go if it's feasible.

                        However, what about experiments in which RNAs from two populations are compared, in order to look for differentially expressed genes, but multiple individuals from each population (treated vs. untreated, mutant vs. wild-type, etc.) must be pooled to get enough biological material to do the experiment?

                        Could that be considered "adequately sampling the two populations"? If one is extremely stringent with statistical analysis, wouldn't the problem of variation of gene expression WITHIN one population be contained and overcome in that analysis?

                        I'm curious as to what people think about this, since pooling individuals (flies, plants, you name it) is a very common practice.
                        The problem still remains that to do any statistical analysis properly, you have to be able to assess the variation in your samples, that is impossible if you do not have independent samples.

                        Now it is often the case that one may need to pool material into one sample to even get enough material for that one sample. Depending on how this is done, it has the effect of reducing the variation within that sample, but if you are going to be pooling, then you should have multiple samples consisting of independent pools. That way you can still assess variation.

                        In the instances where I have had to do this working in plants, I will pool material from multiple plants into one sample and for my biological replicates, pool material from a different set of plants grown alongside those.

                        Comment


                        • #27
                          chadn that's an interesting subject. researchers in my lab have sometimes pooled many animals into single samples and run single replicate runs in the past. they assumed that the single sample would be all they'd need because it was actually made up of maybe 10 animals. another kind of misconception. while I'd expect that single sample to be pretty robust and complete we've totally lost the biological variation between all of those animals so....stats kinda go out the window.
                          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                          Salk Institute for Biological Studies, La Jolla, CA, USA */

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Exploring the Dynamics of the Tumor Microenvironment
                            by seqadmin




                            The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                            07-08-2024, 03:19 PM
                          • seqadmin
                            Exploring Human Diversity Through Large-Scale Omics
                            by seqadmin


                            In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                            06-25-2024, 06:43 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 07-19-2024, 07:20 AM
                          0 responses
                          40 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 07-16-2024, 05:49 AM
                          0 responses
                          52 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 07-15-2024, 06:53 AM
                          0 responses
                          64 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 07-10-2024, 07:30 AM
                          0 responses
                          43 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X