Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sol
    replied
    What is the function of the MA plot the fold change??
    The graph shows the relationship in the MA-plot, no??
    thanks

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by Sol View Post
    but, what is the mean RPKM?
    thanks
    The normalized read count for a region (say a gene or an exon), against the region length (measured by kilo-base) and the sequencing depth (measured by million reads). So RPKM is short for Reads Per Kilo-base per Million reads.

    Leave a comment:


  • Sol
    replied
    but, what is the mean RPKM?
    thanks

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by Sol View Post
    Thanks

    and the RPKM. how to normalize the data?
    I divide the number of reads by the size of the gene and divide by all the reads?
    How it is calculated
    thanks
    Actually, we recommand the users feed the raw read counts (that is the number of reads falling in a gene's exonic region) to DEGseq. DEGseq will normalize the data according the sequencing depth for each sample.

    Leave a comment:


  • Sol
    replied
    Thanks

    and the RPKM. how to normalize the data?
    I divide the number of reads by the size of the gene and divide by all the reads?
    How it is calculated
    thanks

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by Sol View Post
    but, o z-score, is based on what data? and what is difference between q-value and p-value. I don't understand.
    Thanks
    Z-score is also based on your input data. We assume that most of genes are not differentially expressed. Please refer to our DEGseq paper's supplementary material: http://bioinformatics.oxfordjournals...28-File001.pdf
    Search "Z-score" for details.

    q-value is a kind of corrected p-value for multiple testing. Please refer to Section 2.3 of our DEGseq paper:
    "2.3 Multiple testing correction
    For the above methods, the P-values calculated for each gene are adjusted to Q-values for multiple testing corrections by two alternative strategies (Benjamini and Hochberg, 1995; Storey and Tibshirani, 2003). Users can set either a P-value or a false discovery rate (FDR) threshold to identify differentially expressed genes.
    "

    If it is still unclear, please let me know. Thanks.

    Leave a comment:


  • Sol
    replied
    but, o z-score, is based on what data? and what is difference between q-value and p-value. I don't understand.
    Thanks

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by Sol View Post
    Hi
    I ran the program DEGexp of the DEGseq. The output file generated a table with values of log2 fold change, z-score, p-value, q-value and signature (p-value <0.001). How to interpret the gene upregulation and downregulation? what each column means? the input file was RPKM and genes. I have not replicate, but I'm comparing two conditions.
    Thanks
    Hi Sol, Thanks for using DEGseq.

    In the output file, there are 2 columns for fold-change: "log2(Fold_change)" and "log2(Fold_change) normalized". log2(Fold_change) = log(value1/value2), and the normalized value is got from the normalized value1 and value2. From the value of fold-change, you can judge this gene is up-regulated or down-regulated. For example, for a gene if its log2(Fold_change) > 0, which means value1 > value2, and if its signature = TRUE, this gene is significantly down-regulated in condition 2. Also, you can look into z-scores.

    Hope this helps.

    Leave a comment:


  • Sol
    replied
    Hi
    I ran the program DEGexp of the DEGseq. The output file generated a table with values of log2 fold change, z-score, p-value, q-value and signature (p-value <0.001). How to interpret the gene upregulation and downregulation? what each column means? the input file was RPKM and genes. I have not replicate, but I'm comparing two conditions.
    Thanks

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by luoruicd View Post
    Hi Xi Wang,
    Thanks for your software. I am wondering what's the difference between DEseq and DEGseq?
    Please read the respective articles for the two tools.

    Leave a comment:


  • luoruicd
    replied
    Hi Xi Wang,
    Thanks for your software. I am wondering what's the difference between DEseq and DEGseq?

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by osvaldoreis View Post
    Hey all, I'm just start using Degseq and I'm having some troubles. I have many library without replicats and I want to see the differential expression in the time course of a disease. I have solexa reads MP ~35 bp og RNA-seq, the genome and a gene prediction of this specie. I align the reads with the predicted genes using bowtie then I made and script to make a file like this:

    "Gene" "LB1" "LB2"
    "g1" 45 103
    ... ... ...

    where the numbers are how many reads align with that gene for each library. Then I ran DEGexp like this:


    library(DEGseq)

    geneExpMatrix1 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(2))
    geneExpMatrix2 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(3))

    layout(matrix(c(1,2), 3, 2, byrow=TRUE))

    par(mar=c(2, 2, 2, 2))

    DEGexp(geneExpMatrix1=geneExpMatrix1, geneCol1=1, expCol1=2, groupLabel1="GPB1", geneExpMatrix2=geneExpMatrix2, geneCol2=1, expCol2=2, groupLabel2="DPB1", method="MARS", outputDir="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/", pValue=1e-2, thresholdKind=1)

    I got results without errors but the results have a lot of differentially expressed genes. I have ~18k genes and ~12k are differentially expressed. Looking at the file I can see that many of them aren't significant. I wanted to know if I'm doing something wrong?

    Thanks for any help!
    Hi,

    Thanks for using DEGseq in differential expression analysis.

    As your samples are without biological replicates (right?), the statistical model in DEGseq only depicts the measurement uncertainty in RNA-seq technology, so there could be some genes, which are picked up as differentially expressed genes, do appear to be differentially expressed in samples (for several reasons such as individual differences) but don't have biological significance. It is often said that statistical significance doesn't equal to biological significance.
    Another thing is you may try more stringent p-value (or q-value) cutoff, say specifying pValue=1e-3. Or you'd better use thresholdKind=3 or thresholdKind=4. The q-values are adjusted from p-values for multiple testing correction.

    Hope this helps.
    Last edited by Xi Wang; 10-01-2010, 01:05 AM.

    Leave a comment:


  • osvaldoreis
    replied
    Starting using Degseq

    Hey all, I'm just start using Degseq and I'm having some troubles. I have many library without replicats and I want to see the differential expression in the time course of a disease. I have solexa reads MP ~35 bp og RNA-seq, the genome and a gene prediction of this specie. I align the reads with the predicted genes using bowtie then I made and script to make a file like this:

    "Gene" "LB1" "LB2"
    "g1" 45 103
    ... ... ...

    where the numbers are how many reads align with that gene for each library. Then I ran DEGexp like this:


    library(DEGseq)

    geneExpMatrix1 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(2))
    geneExpMatrix2 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(3))

    layout(matrix(c(1,2), 3, 2, byrow=TRUE))

    par(mar=c(2, 2, 2, 2))

    DEGexp(geneExpMatrix1=geneExpMatrix1, geneCol1=1, expCol1=2, groupLabel1="GPB1", geneExpMatrix2=geneExpMatrix2, geneCol2=1, expCol2=2, groupLabel2="DPB1", method="MARS", outputDir="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/", pValue=1e-2, thresholdKind=1)

    I got results without errors but the results have a lot of differentially expressed genes. I have ~18k genes and ~12k are differentially expressed. Looking at the file I can see that many of them aren't significant. I wanted to know if I'm doing something wrong?

    Thanks for any help!

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by sma View Post
    Dear WANG Xi,

    I am running into troubles using DEGseq. Admittedly, I am a novice to analyzing this kind of data and using R---in fact I only loaded R just yesterday in order to run your DEGseq package.

    I am trying to compare three separate sets of 454 data (from 3 stages of development). When we get the data back from our company in Shanghai, it is complete with the annotations and the reads are already counted. There are no replicates (just 3 samples). So, I think I can simply use DEGexp with the MARS method.

    I have followed your recent suggestions to "(1) download the most recent version with its “reference manual”, (2) following the examples in the manual, apply DEGseq’s functions to the test data, (3) replace the data set, and apply DEGseq to your own data." I can successfully run the test data without problem. I can also replace the data with my own data and I can successfully show the filepath to my data (so I know I have entered it correctly). However, when I try to set up the geneExpMatrix, I run into problems. I always get an error that says :
    "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
    line 10 did not have 4 elements"

    I've tried everything I can think of and I cannot seem to remedy this problem. Any suggestion?

    Thank you!
    Dear Sma,

    Thanks for using DEGseq. I am wondering your problem may be caused by the format of the input file to geneExpMatrix. Could you paste the first 15 lines in the input file here, or email to me via [email protected]? Thanks.

    Leave a comment:


  • sma
    replied
    Dear WANG Xi,

    I am running into troubles using DEGseq. Admittedly, I am a novice to analyzing this kind of data and using R---in fact I only loaded R just yesterday in order to run your DEGseq package.

    I am trying to compare three separate sets of 454 data (from 3 stages of development). When we get the data back from our company in Shanghai, it is complete with the annotations and the reads are already counted. There are no replicates (just 3 samples). So, I think I can simply use DEGexp with the MARS method.

    I have followed your recent suggestions to "(1) download the most recent version with its “reference manual”, (2) following the examples in the manual, apply DEGseq’s functions to the test data, (3) replace the data set, and apply DEGseq to your own data." I can successfully run the test data without problem. I can also replace the data with my own data and I can successfully show the filepath to my data (so I know I have entered it correctly). However, when I try to set up the geneExpMatrix, I run into problems. I always get an error that says :
    "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
    line 10 did not have 4 elements"

    I've tried everything I can think of and I cannot seem to remedy this problem. Any suggestion?

    Thank you!

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:47 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Working...
X