Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq-strange disoersion plot and using shorth

    I am a newbie to bioinformatics and am trying to analyse RNASeq data with DeSeq.The dispersion plot for my data looks different from the the typical plots.I have attached two sample plots-the issue is that there is a sharp upper boundary and don't know how to interpret this!

    I have tried the various options for sharingmode and fitytpe for my data and sometimes, default works but with a different sample list, I have to try a local fit or different sharing mode). I have four different conditions and there are 6-16 biological replicates for each, so sample size is not a problem.

    Also, when I try to change estimate size factor command using the following, it does not work:
    cds = estimateSizeFactors(cds,locfunc=shorth)

    Am I doing anything wrong here?
    Thanks for your help
    Attached Files
    Last edited by tellsparck; 01-10-2013, 12:12 PM. Reason: Spelling mistake in title

  • #2
    I have run into the same problem with you,expect the answers……


    • #3
      Hi - you are getting huge dispersions, of the order of 10, indicating that the counts between your different "replicate" samples are very, very different. Have you tried looking at pairwise scatterplots of the data? I.e. something like (replace pasilla with your own data):

      trsf = function(x, c=1) log2(x+c)
      pairs(trsf(counts(pasillaGenes)), pch=".")

      Best wishes
      Wolfgang Huber


      • #4
        Thanks! The high dispersions are somewhat expected because the data is from single cell RNA which underwent amplification.So there is inherent cell to cell variability and technical variability coming from amplification The question now is how badly this will affect the statistical analysis that follows. Do you think using per gene est (or any other deviation from the default) may help? Have you tried to modify DeSeq for this type of data?


        • #5
          First: If you ask for advice on this forum, please always mention all relevant facts. Asking as question as yours without mentioning that you are not talking about standard RNA-Seq but about something unusual and very experimental, namely single-cell RNA-Seq, just wastes everybody's time as you will only get wrong advice.

          Now: The fit (red line) is indeed not very good, and we have some eays to improve the fit in siutations such as yours. This won't help much because the raw estrimates (black dots) are show that nearly all of your genes have dispersions above one and hence vary by a factor of two or more between cells of the same cell type. Unless the differences between different cell types are really drastic (at least, say, ten-fold), you cannot see them in this noise. This is not a problem of the statistical analysis, but one of the experimental protocol.


          • #6
            Sorry for not mentioning the nature of data.Yes it is noisy, but the differences between groups are also drastic, many genes are close to zero in one group and thousands in another.But of course there are also some with less drastic differences.Given this how can I extract the maximum info out of it? I have tried using only samples that look similar in the PCA and similar Q3 and so on...
            Can you suggest any modifications in DeSeq that can improve the analysis?
            Many thanks


            • #7
              "many genes are close to zero in one group and thousands in another" -- yes, this is a drastic difference, but have a look at your replicates: I guess you will see equally drastic changes between two cells of the same type. This is a quite common problem in single-cell RNA-Seq, and you are not the first one to find this out the hard way, sorry.

              As you have many samples, you could try to switch to 'sharingMode="gene-ests-only"'. This might be a little bit anticonservative, and if even this does not give you anything there might simply be nothing in your data.


              • #8
                Thanks Simon. Yes, using 'gene est only' mode works and outputs a good sized list of differential expression, among them internal control genes which we know should be differentially expressed. (I can get this even with default settings in some comparisons).Two of my groups are from very closely related cells and it is here that I have to change sharing mode.
                Do you think I should fix the fit as you mentioned before? If so, how can I do this?
                Thanks again!


                • #9
                  No, the whole point of "gene-ests-only" is that it instructs DESeq to ignore the fit, so it doesn't matter any more if it's bad.


                  • #10
                    Thanks Simon. But I can still benefit from improving the fit in some comparisons where I do not use gene-est-only- do you have a code for this which you can share with me?


                    • #11
                      Hello community!

                      I have 3 conditions and 5 total replicates. So my DOF is 15 samples - 3 conditions = 12. Therefore I am using the gene-est only for estimating my dispersions.

                      cds <- estimateDispersions(cds, method= "per-condition", sharingMode="gene-est-only" )

                      Can anyone point me to where it is discussed to use gene-est when enough DOF have been reached? I saw in a post that Simon mentions once you get to 10 to 15, you can use gene-est, but I'm doing my proposal next week, and I want to be able to point to something more formal.

                      Thanks very much!!


                      Latest Articles


                      • seqadmin
                        Best Practices for Single-Cell Sequencing Analysis
                        by seqadmin

                        While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                        06-06-2024, 07:15 AM
                      • seqadmin
                        Latest Developments in Precision Medicine
                        by seqadmin

                        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                        Somatic Genomics
                        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                        05-24-2024, 01:16 PM





                      Topics Statistics Last Post
                      Started by seqadmin, Today, 07:23 AM
                      0 responses
                      Last Post seqadmin  
                      Started by seqadmin, 06-17-2024, 06:54 AM
                      0 responses
                      Last Post seqadmin  
                      Started by seqadmin, 06-14-2024, 07:24 AM
                      0 responses
                      Last Post seqadmin  
                      Started by seqadmin, 06-13-2024, 08:58 AM
                      0 responses
                      Last Post seqadmin