Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • negative p-value in DEseq analysis

    Dear all,
    I'm trying to make a Differential expression analysis with RNAseq data. I have two conditions and, unfortunately, no replicates.
    I decided to use DEseq for the analysis but the obtained results don't convince me.
    I obtained negative p-value, is this possible???
    And then, if I look at the genes with a pval<0.01, they all have a padj equal to 1. Which p-value should I consider??

    This is the script I used:

    count_table <- read.table("counts_17dpf_21dpf.txt", header=T, sep="\t", row.names=1)
    head(count_table)
    expt_design <- data.frame(row.names = colnames(count_table), condition = c("17dpf","21dpf"))
    expt_design
    conditions = expt_design$condition
    conditions
    library("DESeq")
    data <- newCountDataSet(count_table, conditions)
    head(counts(data))
    data <- estimateSizeFactors(data)
    sizeFactors(data)
    data2 <- data[,c ("X17dpf_rep1","X21dpf_rep1")]
    data2 <- estimateSizeFactors(data2)
    data2 <- estimateDispersions(data2, method="blind", sharingMode="fit-only", fitType="local")
    results2 <- nbinomTest(data2, "17dpf", "21dpf")
    write.table(results2,file="DESeq_results.txt",sep="\t",row.names=rownames(results2),col.names=colnames(results2),quote=F)


    Any suggestions are very appreciated!!

    Marianna

  • #2
    Can you post an example?

    Comment


    • #3
      Here you can find some significative results.
      Attached Files

      Comment


      • #4
        Nothing wrong with your results, you are just reading it wrong. You need to shift the header column over one.

        Code:
        shiftOver->	id	baseMean	baseMeanA	baseMeanB	foldChange	log2FoldChange	pval	padj	
        25916	Contig25916	93.49866325	93.49325776	93.50406873	1.000115634	0.000166814	1	1
        18832	Contig18832	94.49867786	94.48786689	94.50948883	1.000228833	0.000330098	1	1
        60472	larve17dpf_CGATGT_L002_R1_001_(paired)_trimmed_(paired)_contig_9475	188.9973557	188.9757338	189.0189777	1.000228833	0.000330098	1	1
        125763	larve48dpf_ACAGTG_L002_R1_001_(paired)_trimmed_(paired)_contig_301	116.3746731	231.7439262	1.005420094	0.004338496	-7.84858929	0.000687227	0.971172987
        218221	rud_dec2_c2353	301.3827814	599.7493025	3.016260282	0.005029202	-7.635454836	7.03E-05	0.24657811
        56313	larve17dpf_CGATGT_L002_R1_001_(paired)_trimmed_(paired)_contig_642	393.3895309	782.7573815	4.021680376	0.005137838	-7.60462297	3.66E-05	0.177915006
        130823	larve48dpf_ACAGTG_L002_R1_001_(paired)_trimmed_(paired)_contig_11355	82.55796287	164.1105056	1.005420094	0.006126482	-7.350725359	0.001972149	1
        159688	onemoutholdseed_CTTGTA_L002_R1_001_(paired)_trimmed_(paired)_contig_5425	329.7345469	655.4474135	4.021680376	0.006135779	-7.3485378	7.80E-05	0.250145666
        Notice that the very last column is your padj and the column immediately next to it is the pvalue. The negative values are the log2fold changes, not the pvalues.

        Easy solution, open your results in Excel and move the top row over one column and I think your results will be far more meaningful to you.

        Comment


        • #5
          AHHHHHHHHHHH
          probably there was a problem in the convertion from txt to excel...
          now it's clear...unfortunately I only have 10 genes with a padj <0.01.

          Thank you very much!

          Comment


          • #6
            Originally posted by Marianna85 View Post
            AHHHHHHHHHHH
            probably there was a problem in the convertion from txt to excel...
            now it's clear...unfortunately I only have 10 genes with a padj <0.01.

            Thank you very much!
            With no replicates, you have no statistical discrimination. All you are doing is effectively comparing the difference between single pairs of numbers, so even those 10 with a "significant" adjusted P-value are highly suspect.

            Differential gene expression without replication simply cannot be done. Would you do population genetics on allele frequencies from samples with N of 1? Of course not. You may be able to say which of your genes had different count values between your samples, but you cannot assign any significance (statistical or biological) to those differences.

            Back in the early days of microarrays, it seems we went through the same thing. Everyone trying to squeeze meaning out of array experiments with single samples in each comparison group. It did not work then, and it does not work now.
            Michael Black, Ph.D.
            ScitoVation LLC. RTP, N.C.

            Comment


            • #7
              mBblack I agree with you. I know that without replicates any reliable differences can be detected.
              The two libraries are pools of several larvae and the aim of the study was mainly to obtain informations about the transcriptome. I red that it could be possible to make a differential espression even without replicates and I decided to try. But the only thing that I can reliably say is that there are few genes with a high fold change between the two samples but further evaluations are needed.

              Thank you

              Comment


              • #8
                I understand, but even fold change in that instance is highly suspect. Fold change is just relative difference. In your case, you have single pairs of numbers, with no idea at all of how much variation there would normally be around those numbers, so your estimates of fold change tell you nothing about what the actual differences might or might not be between your treatments.

                Again, this is the same debate that went on 10-12 years ago with arrays, when people argued for the use of fold change to base biological interpretation from non-replicate experiments. But fold change as an estimate of difference is just as subject to properly characterizing variation within each population as is statistical tests of difference. Without replicates to measure variation within each group, you have no idea of what the true degree of difference between them is. Your large fold changes may merely be due to pure chance in your selection of specimens for sequencing. Had you done replicates, those "large" fold changes, when based on robust mean difference, may turn out to be trivial (especially if those genes are as varied within any treatment or strain as they are between them).
                Michael Black, Ph.D.
                ScitoVation LLC. RTP, N.C.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X