DEGseq - SEQanswers

Sol replied

10-27-2010, 11:09 AM
What is the function of the MA plot the fold change??
The graph shows the relationship in the MA-plot, no??
thanks
Leave a comment:
Xi Wang replied

10-27-2010, 09:39 AM
Originally posted by Sol View Post

but, what is the mean RPKM?
thanks

The normalized read count for a region (say a gene or an exon), against the region length (measured by kilo-base) and the sequencing depth (measured by million reads). So RPKM is short for Reads Per Kilo-base per Million reads.
Leave a comment:
Sol replied

10-27-2010, 09:31 AM
but, what is the mean RPKM?
thanks
Leave a comment:
Xi Wang replied

10-27-2010, 09:20 AM
Originally posted by Sol View Post

Thanks

and the RPKM. how to normalize the data?
I divide the number of reads by the size of the gene and divide by all the reads?
How it is calculated
thanks

Actually, we recommand the users feed the raw read counts (that is the number of reads falling in a gene's exonic region) to DEGseq. DEGseq will normalize the data according the sequencing depth for each sample.
Leave a comment:
Sol replied

10-27-2010, 09:01 AM
Thanks

and the RPKM. how to normalize the data?
I divide the number of reads by the size of the gene and divide by all the reads?
How it is calculated
thanks
Leave a comment:
Xi Wang replied

10-27-2010, 08:29 AM
Originally posted by Sol View Post

but, o z-score, is based on what data? and what is difference between q-value and p-value. I don't understand.
Thanks

Z-score is also based on your input data. We assume that most of genes are not differentially expressed. Please refer to our DEGseq paper's supplementary material: http://bioinformatics.oxfordjournals...28-File001.pdf
Search "Z-score" for details.

q-value is a kind of corrected p-value for multiple testing. Please refer to Section 2.3 of our DEGseq paper:
"2.3 Multiple testing correction
For the above methods, the P-values calculated for each gene are adjusted to Q-values for multiple testing corrections by two alternative strategies (Benjamini and Hochberg, 1995; Storey and Tibshirani, 2003). Users can set either a P-value or a false discovery rate (FDR) threshold to identify differentially expressed genes.
"

If it is still unclear, please let me know. Thanks.
Leave a comment:
Sol replied

10-27-2010, 08:14 AM
but, o z-score, is based on what data? and what is difference between q-value and p-value. I don't understand.
Thanks
Leave a comment:
Xi Wang replied

10-26-2010, 06:37 PM
Originally posted by Sol View Post

Hi
I ran the program DEGexp of the DEGseq. The output file generated a table with values of log2 fold change, z-score, p-value, q-value and signature (p-value <0.001). How to interpret the gene upregulation and downregulation? what each column means? the input file was RPKM and genes. I have not replicate, but I'm comparing two conditions.
Thanks

Hi Sol, Thanks for using DEGseq.

In the output file, there are 2 columns for fold-change: "log2(Fold_change)" and "log2(Fold_change) normalized". log2(Fold_change) = log(value1/value2), and the normalized value is got from the normalized value1 and value2. From the value of fold-change, you can judge this gene is up-regulated or down-regulated. For example, for a gene if its log2(Fold_change) > 0, which means value1 > value2, and if its signature = TRUE, this gene is significantly down-regulated in condition 2. Also, you can look into z-scores.

Hope this helps.
Leave a comment:
Sol replied

10-26-2010, 10:54 AM
Hi
I ran the program DEGexp of the DEGseq. The output file generated a table with values of log2 fold change, z-score, p-value, q-value and signature (p-value <0.001). How to interpret the gene upregulation and downregulation? what each column means? the input file was RPKM and genes. I have not replicate, but I'm comparing two conditions.
Thanks
Leave a comment:
Xi Wang replied

10-11-2010, 05:30 PM
Originally posted by luoruicd View Post

Hi Xi Wang,
Thanks for your software. I am wondering what's the difference between DEseq and DEGseq?

Please read the respective articles for the two tools.
Leave a comment:
luoruicd replied

10-08-2010, 12:06 PM
Hi Xi Wang,
Thanks for your software. I am wondering what's the difference between DEseq and DEGseq?
Leave a comment:
Xi Wang replied

10-01-2010, 01:00 AM
Originally posted by osvaldoreis View Post

Hey all, I'm just start using Degseq and I'm having some troubles. I have many library without replicats and I want to see the differential expression in the time course of a disease. I have solexa reads MP ~35 bp og RNA-seq, the genome and a gene prediction of this specie. I align the reads with the predicted genes using bowtie then I made and script to make a file like this:

"Gene" "LB1" "LB2"
"g1" 45 103
... ... ...

where the numbers are how many reads align with that gene for each library. Then I ran DEGexp like this:

library(DEGseq)

geneExpMatrix1 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(2))
geneExpMatrix2 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(3))

layout(matrix(c(1,2), 3, 2, byrow=TRUE))

par(mar=c(2, 2, 2, 2))

DEGexp(geneExpMatrix1=geneExpMatrix1, geneCol1=1, expCol1=2, groupLabel1="GPB1", geneExpMatrix2=geneExpMatrix2, geneCol2=1, expCol2=2, groupLabel2="DPB1", method="MARS", outputDir="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/", pValue=1e-2, thresholdKind=1)

I got results without errors but the results have a lot of differentially expressed genes. I have ~18k genes and ~12k are differentially expressed. Looking at the file I can see that many of them aren't significant. I wanted to know if I'm doing something wrong?

Thanks for any help!

Hi,

Thanks for using DEGseq in differential expression analysis.

As your samples are without biological replicates (right?), the statistical model in DEGseq only depicts the measurement uncertainty in RNA-seq technology, so there could be some genes, which are picked up as differentially expressed genes, do appear to be differentially expressed in samples (for several reasons such as individual differences) but don't have biological significance. It is often said that statistical significance doesn't equal to biological significance.
Another thing is you may try more stringent p-value (or q-value) cutoff, say specifying pValue=1e-3. Or you'd better use thresholdKind=3 or thresholdKind=4. The q-values are adjusted from p-values for multiple testing correction.

Hope this helps.

Last edited by Xi Wang; 10-01-2010, 01:05 AM.
Leave a comment:
osvaldoreis replied

09-30-2010, 08:11 AM
Starting using Degseq

Hey all, I'm just start using Degseq and I'm having some troubles. I have many library without replicats and I want to see the differential expression in the time course of a disease. I have solexa reads MP ~35 bp og RNA-seq, the genome and a gene prediction of this specie. I align the reads with the predicted genes using bowtie then I made and script to make a file like this:

"Gene" "LB1" "LB2"
"g1" 45 103
... ... ...

where the numbers are how many reads align with that gene for each library. Then I ran DEGexp like this:

library(DEGseq)

geneExpMatrix1 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(2))
geneExpMatrix2 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(3))

layout(matrix(c(1,2), 3, 2, byrow=TRUE))

par(mar=c(2, 2, 2, 2))

DEGexp(geneExpMatrix1=geneExpMatrix1, geneCol1=1, expCol1=2, groupLabel1="GPB1", geneExpMatrix2=geneExpMatrix2, geneCol2=1, expCol2=2, groupLabel2="DPB1", method="MARS", outputDir="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/", pValue=1e-2, thresholdKind=1)

I got results without errors but the results have a lot of differentially expressed genes. I have ~18k genes and ~12k are differentially expressed. Looking at the file I can see that many of them aren't significant. I wanted to know if I'm doing something wrong?

Thanks for any help!
Leave a comment:
Xi Wang replied

08-16-2010, 04:49 PM
Originally posted by sma View Post

Dear WANG Xi,

I am running into troubles using DEGseq. Admittedly, I am a novice to analyzing this kind of data and using R---in fact I only loaded R just yesterday in order to run your DEGseq package.

I am trying to compare three separate sets of 454 data (from 3 stages of development). When we get the data back from our company in Shanghai, it is complete with the annotations and the reads are already counted. There are no replicates (just 3 samples). So, I think I can simply use DEGexp with the MARS method.

I have followed your recent suggestions to "(1) download the most recent version with its “reference manual”, (2) following the examples in the manual, apply DEGseq’s functions to the test data, (3) replace the data set, and apply DEGseq to your own data." I can successfully run the test data without problem. I can also replace the data with my own data and I can successfully show the filepath to my data (so I know I have entered it correctly). However, when I try to set up the geneExpMatrix, I run into problems. I always get an error that says :
"Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 10 did not have 4 elements"

I've tried everything I can think of and I cannot seem to remedy this problem. Any suggestion?

Thank you!

Dear Sma,

Thanks for using DEGseq. I am wondering your problem may be caused by the format of the input file to geneExpMatrix. Could you paste the first 15 lines in the input file here, or email to me via [email protected]? Thanks.
Leave a comment:
sma replied

08-16-2010, 07:22 AM
Dear WANG Xi,

I am running into troubles using DEGseq. Admittedly, I am a novice to analyzing this kind of data and using R---in fact I only loaded R just yesterday in order to run your DEGseq package.

I am trying to compare three separate sets of 454 data (from 3 stages of development). When we get the data back from our company in Shanghai, it is complete with the annotations and the reads are already counted. There are no replicates (just 3 samples). So, I think I can simply use DEGexp with the MARS method.

I have followed your recent suggestions to "(1) download the most recent version with its “reference manual”, (2) following the examples in the manual, apply DEGseq’s functions to the test data, (3) replace the data set, and apply DEGseq to your own data." I can successfully run the test data without problem. I can also replace the data with my own data and I can successfully show the filepath to my data (so I know I have entered it correctly). However, when I try to set up the geneExpMatrix, I run into problems. I always get an error that says :
"Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 10 did not have 4 elements"

I've tried everything I can think of and I cannot seem to remedy this problem. Any suggestion?

Thank you!
Leave a comment:

Previous 1 2 3 4 5 6 7 11 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News