The sequencing core here is including a control sample(technical replicate) onto a multiplexed lane for every RNA-seq run. This is so that if the run goes well, there should be a high correlation in the control data between runs. In the past runs, I have looked at Spearmans correlation test and scatter plots to check for high concordance. However, the results from these tests varied and interpreting them was not the easiest.
The Marioni et al. paper focuses on technical replicability of samples of a single flow cell. Could this be extended to withing run sample concordance?
If so, I could think of 3 ways in which sample concordance could be measured.
a) Spearmans correlation of counts
b) Poisson modelling the counts should show little differential expression as shot noise should be modelled by Poisson
c) Use a hyper-geometric distribution model to compute P-value testing whether the number of counts differed than expected by random sampling.
Are any of the above methods appropriate for answering the question? If not what could be a possible way.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
If you compare a single sample to another single sample, you heve to use DESeq's "blind" dispersion estimation method. For the comparison between conditions you probably have not used the blind settings. Comparing the number of DE genes from two analyses performed with so completely different settings makes no sense.
Leave a comment:
-
Originally posted by Simon Anders View PostI am not sure what you mean by "doing differential expression within replicates" but if you really have so strong variance, the DE analysis between conditions should better not give many hits.
For example, if for condition 1 i have 3 replicates. so, by examining rep1 to rep2 using DESeq i am getting 26 DE genes.
Moreover, if i am examining cond1 rep1 vs. cond2 rep1 i am getting 300 DE genes..
hope it was more clear..
Thanks
Leave a comment:
-
I am not sure what you mean by "doing differential expression within replicates" but if you really have so strong variance, the DE analysis between conditions should better not give many hits.
Leave a comment:
-
We have to remember that when the resolution is higher the variance also..
Leave a comment:
-
in my experiment we did amplification.. which cause to many genes have expression of 0.. even within the same condition..
i can have for the same gene in
condition 1 replicate A count of 10
condition 1 replicate B count of 100
condition 1 replicate C count of 1000
or,
condition 1 replicate A count of 0
condition 1 replicate B count of 0
condition 1 replicate C count of 150
That is why when i am doing correlation i am getting very low correlation(0.2)..
But, when i am doing Differential expression within replicates and between conditions (Both with Deseq) i am getting much more DE genes between the conditions..
To conclude:
Differential expression looks to work fine..
Correlation not.. (i tried also to remove outliers.. 0'z or 10000+)
Any advice?
I plotted in scatter and it looks not good..
Any help will be appreciated!
Leave a comment:
-
Yes sure.
I thought maybe there is a shortcut that i am missing.
Thank!
Leave a comment:
-
I have been using HTseq for my counts but the most obvious thing to do it seems is to rerun cuffdiff now with each replicate as the new input as opposed to grouping replicates.
Leave a comment:
-
Originally posted by vyellapa View PostCounts can be obtained using two ways:
-Using HTseq-counts
htseq-count -m intersection-strict -s no queryNameSorted.sam ~/GRCh37_E64_1kg.gtf > output
-Using cuffdiff.cuffdiff now gives a another output file with count information.
It is possible to convert FPKMs to counts but the above methods are more straightforward and tested.
But, when i looked into cuffdiff output, i found the mean for each replicate..(genes.count_tracking)
if i want to plot 2 replicates.. i need the raw for each...
Am i missing something?
Thanks
Leave a comment:
-
Counts can be obtained using two ways:
-Using HTseq-counts
htseq-count -m intersection-strict -s no queryNameSorted.sam ~/GRCh37_E64_1kg.gtf > output
-Using cuffdiff.cuffdiff now gives a another output file with count information.
It is possible to convert FPKMs to counts but the above methods are more straightforward and tested.Last edited by vyellapa; 07-17-2012, 09:26 AM.
Leave a comment:
-
Originally posted by vyellapa View PostThank you Simon.Plotting raw counts scatter plot showed a tight plot at 45 degrees from the origin. The correlation value was .92 too.Using the count values does make sense, however, why such pattern is not found in the FPKM values is something I cannot make sense of.
Leave a comment:
-
Originally posted by Simon Anders View Postre #1 and #3: It is generally more helpful to look at scatter plots than only at correlation values. You want to know whether is is only the genes with low counts or all genes that differ a lot. For this, make sure to plot raw counts, not RPKM values.
While one possibility is that cufflinks could throw some FPKM values which could be outliers, I filtered all rows which have an exponential value(such as 1.78989e+06, 8.1667e-05, etc..) and used the remaining values for the scatter plot.
If the FPKM values are indeed normalized what I really should be seeing is difference between these values(FPKM_sample1-FPKM_sample2) close to 0?
Leave a comment:
-
re #1: You could try to run a standard DESeq analysis on your data, and see if it finds anything. Maybe, you have only very few differentially expressed genes, which a DE analysis can find but which are not enough to change the overall correlation coefficient. More likely, though, a DE analysis will confirm what your comparison of correlation coefficients suggests, namely, that the effect of the differences between your conditions is weaker than your variation between replicates, i.e., that your experiment has failed.
re #1 and #3: It is generally more helpful to look at scatter plots than only at correlation values. You want to know whether is is only the genes with low counts or all genes that differ a lot. For this, make sure to plot raw counts, not RPKM values.
Leave a comment:
-
I am trying to find correlation between two biological replicates RNA-seq runs.
I ran Tophat and Cufflinks, extracted the FPKM values and plotted these values against each other and find correlation using R.
Code:library(car) reg1<-lm(FPKM_1~FPKM_2) cor(FPKM_1,FPKM_2)
Leave a comment:
-
At first glance you should plot the data against each other. Which means that, for instance let's assume you got one dataset (A) and one biological replicate (B) , you plot the expression value of A against the value of B. In best case, when correlation is 1 you should get a straight line through your coordinate system. For sure their will be some bias through amplification but in my opinion it shouldn't count that much (at least in bio-replicates). Comparing conditions might rise results like 0.6 correlation if the conditions are more or less totally different!
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...-
Channel: Articles
07-08-2024, 03:19 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:46 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:46 AM
|
||
Started by seqadmin, 07-24-2024, 11:09 AM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
07-24-2024, 11:09 AM
|
||
Started by seqadmin, 07-19-2024, 07:20 AM
|
0 responses
159 views
0 likes
|
Last Post
by seqadmin
07-19-2024, 07:20 AM
|
||
Started by seqadmin, 07-16-2024, 05:49 AM
|
0 responses
127 views
0 likes
|
Last Post
by seqadmin
07-16-2024, 05:49 AM
|
Leave a comment: