Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pval/padj equals 0 using DESeq

    Hello,

    I am using DESeq for RNA-Seq (using and Illumina sequencer) but I have the following questions:

    - I have some genes with pval and padj equals to 0. Apologies for my ignorance on statistical issues but...has this any sense?. The study comprises two biological replicates for each condition. Below are the results obtained.

    id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj resVarA resVarB
    9841 Gene1 32.96240454 30.32514971 35.59965936 1.173931859 0.231348669 0 0 0.01117127 0.946634653
    12536 Gene2 36.90885144 18.14390996 55.67379292 3.068456195 1.617512988 0 0 0.462602614 0.979230062
    18679 Gene3 32.58350986 28.71130018 36.45571954 1.269734192 0.344526513 0 0 0.147790741 2.158419888
    3391 Gene4 2228038.336 41791.39502 4414285.277 105.6266553 6.72283014 2.88E-59 1.33E-55 0.001232907 21.91637034

    - By other hands: I am aware that "counts" for entry to DESeq must be reads. However, taking advantage of CASAVA (the Illumina analysis software I'm running) when calculating RPKM also generates the sum of bases that fall into the exonic regions of each gene; I wonder if I can use those value (number of bases instead of reads) for DESeq. My argument is: Illumina read size is fixed (38 bp, in this case) and, therefore, the number of reads would be approximately* equal to this value divided by 38. Is this OK?


    Thanks in advance


    *NOTE:
    Not exactly the same because only have bases in exons that fall, but I understand that this value would be very close.

  • #2
    Dear Jorge

    thank you. The values given as input to DESeq really need to be read counts, and not something else. I think Section 1 of DESeq's vignette is pretty clear about that. In the statistical model of DESeq, the absolute values matter, e.g. a ratio of 20/10 has a different significance from one of 2000/1000. Thus, you cannot apply arbitrary multiplications to the data and still hope that the result is valid. An ugly fix for your problem might be to divide the numbers you obtain from Casva by 38; but it is not clear whether you will not run into silly artefacts by doing so. The counting scripts provided with DESeq (see Section 1 of the vignette) are the way to go.

    In principle, p-values of 0 can happen (and if they do, the multiple-testing adjusted p-value would usually also be zero), this is because of the finite precision of floating point arithmetic. This would happen if the biological difference is really strong compared to the between-replicates variability. In your case, however, it rather seems that you get these absurd p-values because of the incorrect data input.

    Hope this helps
    Wolfgang
    Wolfgang Huber
    EMBL

    Comment


    • #3
      Dear Wolfgang,

      Thank you very much for your explanation.

      I was thinking to use that value (bases count instead of reads count) because CASAVA creates two shortreads for each read crossing splice sites (in the bam file) and I was trying to avoid to write a custom script to count reads. But I must overcome laziness and get down to work.

      Soon I going to move from ELAND (CASAVA) to BWA but, by the moment, I've to deal with it.

      Thank very mucha gain.

      Jorge

      Comment


      • #4
        Dear Jorge

        In addition to what Wolfgang wrote: Please update to a current version of DESeq.

        Your gene with p value 0 is what we call a variance outlier: The variance residual is way too high (21.9) and you should have filtered this out, as described in the vignette to the old version.

        In the current version, this is handled differently and there is no need to take care of variance outliers manually any more. Please see the vignette of the new version for details.

        Simon

        Comment


        • #5
          Dear Simon,

          Thank you. I'll upgrade DESeq as you suggested.

          Not only you've created an useful tool but you give selflessly support to users. Sincerely, thank you very much.

          Regards

          Jorge

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 08:06 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-30-2024, 12:17 PM
          0 responses
          15 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-29-2024, 10:49 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-25-2024, 11:49 AM
          0 responses
          27 views
          0 likes
          Last Post seqadmin  
          Working...
          X