Hi Everyone,
I have Control and Treated data sets that I have gone through HTSeq-counts and did differential expression gene analysis through EdgeR, DESeq, and DESeq2.
My pipeline is:
Tophat2 > HTSeq count > EdgeR, DESeq, or DESeq2
Samples:
Control: Control_rep1; Control_rep2; Control_rep3
Treated: Treated_rep1; Treated_rep2; Treated_rep3
I have generated heatmaps in R, but when the results show up with the key on the bottom showing each of the triplicates for Control and Treated Groups:
It looks like this:
Control_rep1 | Control_rep2 | Control_rep3 | Treated_rep1 | Treated_rep2 | Treated_rep3
I was wondering if there is any method that I should follow to illustrate to combine all the Controls triplicates and Treated triplicates into two respective columns to just show Control and Treated on the bottom of the heatmap.
I hope to make it look like this in the heatmap:
Control | Treated
Many thanks.
EDIT: I was thinking of the following strategy:
1) I have the log2change values from all three methods.
2) Would converting to Z scores work better?
Source: https://www.biostars.org/p/144765/
Would something like this work?
Source 2: http://stats.stackexchange.com/quest...used-instead-o
I have Control and Treated data sets that I have gone through HTSeq-counts and did differential expression gene analysis through EdgeR, DESeq, and DESeq2.
My pipeline is:
Tophat2 > HTSeq count > EdgeR, DESeq, or DESeq2
Samples:
Control: Control_rep1; Control_rep2; Control_rep3
Treated: Treated_rep1; Treated_rep2; Treated_rep3
I have generated heatmaps in R, but when the results show up with the key on the bottom showing each of the triplicates for Control and Treated Groups:
It looks like this:
Control_rep1 | Control_rep2 | Control_rep3 | Treated_rep1 | Treated_rep2 | Treated_rep3
I was wondering if there is any method that I should follow to illustrate to combine all the Controls triplicates and Treated triplicates into two respective columns to just show Control and Treated on the bottom of the heatmap.
I hope to make it look like this in the heatmap:
Control | Treated
Many thanks.
EDIT: I was thinking of the following strategy:
1) I have the log2change values from all three methods.
2) Would converting to Z scores work better?
Reading the literature and comments, my understanding of the z-score:
1. Convert the count/RPKM values of each gene into log values.
2. Calculate the mean and standard deviation of X gene log values in 20 lung tissues (suppose i have data for 20 samples).
3. For first lung tissue sample: (gene X log value - mean of log values of 20 lung tissues)/ standard deviation of log values of 20 lung tissues.
4. Now. i have the z-score for gene x in first lung tissue sample. Using the above protocol, i can convert all genes log values into z-score.
The question is the above protocol is correct or not, please advised.
Should i calculate the z-score using reads count or RPKM values.
Does these z-score really have meaning. The z-score COSMIC provide:
ID_SAMPLE SAMPLE_NAME GENE_NAME REGULATION Z_SCORE ID_STUDY
1337808 TCGA-02-2483-01 SFMBT1 over 2.416 329
1337808 TCGA-02-2483-01 SGCE normal -0.274 329
If i calculate the z-score using above approach, should i be able to calculate the z-score and find out whether the gene is over regulated or normal regulated .
Please advised how to proceed.
Thankx
1. Convert the count/RPKM values of each gene into log values.
2. Calculate the mean and standard deviation of X gene log values in 20 lung tissues (suppose i have data for 20 samples).
3. For first lung tissue sample: (gene X log value - mean of log values of 20 lung tissues)/ standard deviation of log values of 20 lung tissues.
4. Now. i have the z-score for gene x in first lung tissue sample. Using the above protocol, i can convert all genes log values into z-score.
The question is the above protocol is correct or not, please advised.
Should i calculate the z-score using reads count or RPKM values.
Does these z-score really have meaning. The z-score COSMIC provide:
ID_SAMPLE SAMPLE_NAME GENE_NAME REGULATION Z_SCORE ID_STUDY
1337808 TCGA-02-2483-01 SFMBT1 over 2.416 329
1337808 TCGA-02-2483-01 SGCE normal -0.274 329
If i calculate the z-score using above approach, should i be able to calculate the z-score and find out whether the gene is over regulated or normal regulated .
Please advised how to proceed.
Thankx
Would something like this work?
Source 2: http://stats.stackexchange.com/quest...used-instead-o