Hi all,
I am using the DESeq2 package to analyse my RNA-Seq data set from the fruit fly (D. melanogaster). Unfortunately there are no replicates.
I know this is not optimal and one can't really relay on the statistical strength of the results, but we can still look into the data and relay on the fold-induction differences between the samples.
This is also the reason for my question.
I know the variance might be over-estimated, but what I don not understand is, why I get strange BaseMean and FoldChange results.
This is how I run DESeq2:
But when I look at the results, I get the wrong numbers.
the raw values from my samples:
and these is a snippet off the results from the "differential expression" analysis:
My questions regards the values in the line "FBgn0085380" and "FBgn0085386", just as an example.
In the raw data for the first gene shows a slight higher read counts for sample2, while the number is equal for the second gene. But in the results of the differential expression I get a different picture.
for the first gene I get a BaseMean of ~119, though the numer of reads is lower, in the second I have a similar picture. The FoldChange values are off in the same way.
I get in both a downregulation in my first sample, though the number of reads is higher in the second or equal in the two samples respectively.
Is there an explanation for this behaviour? Are the numbers off due to the fact, that I have no replicate and all the samples are regarded as replicates ( but this still doesn't explain the BaseMean values)?
Thanks in advance
Assa
I am using the DESeq2 package to analyse my RNA-Seq data set from the fruit fly (D. melanogaster). Unfortunately there are no replicates.
I know this is not optimal and one can't really relay on the statistical strength of the results, but we can still look into the data and relay on the fold-induction differences between the samples.
This is also the reason for my question.
I know the variance might be over-estimated, but what I don not understand is, why I get strange BaseMean and FoldChange results.
This is how I run DESeq2:
Code:
cds <- DESeqDataSetFromMatrix ( countData = Comp, colData = colData, design = ~condition ) fit = DESeq(cds) res = results(fit)
the raw values from my samples:
Code:
>Comp[13696:13706,] sample1 sample2 FBgn0085379 1 4 [B]FBgn0085380 104 117[/B] FBgn0085382 101 137 FBgn0085383 88 187 FBgn0085384 90 275 FBgn0085385 18 55 [B]FBgn0085386 40 40[/B] FBgn0085387 16 310 FBgn0085388 910 3333 FBgn0085390 192 179 FBgn0085391 96 359
Code:
>res[13696:13706,] log2 fold change (MAP): condition sample2 vs sample1 Wald test p-value: condition sample2 vs sample1 DataFrame with 11 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> FBgn0085379 2.047768 0.1776917 1.656357 0.1072786 0.9145679 0.999346 [B]FBgn0085380 119.997967 -1.0010375 1.365438 -0.7331255 0.4634819 0.999346[/B] FBgn0085382 123.804622 -0.7832908 1.339541 -0.5847457 0.5587187 0.999346 FBgn0085383 128.899132 -0.2415351 1.299186 -0.1859127 0.8525132 0.999346 FBgn0085384 157.869569 0.2069421 1.275569 0.1622352 0.8711206 0.999346 ... ... ... ... ... ... ... [B]FBgn0085386 44.59838 -1.0634868 1.528435 -0.6958011 0.4865534 0.999346[/B] FBgn0085387 109.25461 2.2342308 1.536826 1.4537959 0.1460029 0.999346 FBgn0085388 1768.01176 0.4434720 1.179468 0.3759932 0.7069220 0.999346 FBgn0085390 210.03007 -1.2372886 1.341767 -0.9221335 0.3564590 0.999346 FBgn0085391 188.81235 0.4581024 1.270345 0.3606124 0.7183892 0.999346
In the raw data for the first gene shows a slight higher read counts for sample2, while the number is equal for the second gene. But in the results of the differential expression I get a different picture.
for the first gene I get a BaseMean of ~119, though the numer of reads is lower, in the second I have a similar picture. The FoldChange values are off in the same way.
I get in both a downregulation in my first sample, though the number of reads is higher in the second or equal in the two samples respectively.
Is there an explanation for this behaviour? Are the numbers off due to the fact, that I have no replicate and all the samples are regarded as replicates ( but this still doesn't explain the BaseMean values)?
Thanks in advance
Assa
Comment