Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Xi Wang
    replied
    Originally posted by m!x View Post
    I successfully ran the SamWrapper, but would appreciate further explanation about the output columns.

    What do the "numerator" and "denominator" columns represent?
    How do I tell whether the genes are upregulated or downregulated?
    eg. if I set the min. foldchange = 2, will this include the genes that are downregulated by 2 times?

    I also got different result when I used "filtered" data as an input.
    First, I used "getGeneExp" to get the raw read counts for each gene.
    From the output, I removed all genes that have less than 2 reads. Then, I filtered for the genes that are common between the biological replicates for each sample.
    I used this filtered set as the input for samWrapper.
    Is this a valid approach?
    Or, should I have used the unfiltered set?

    Thank you!
    Hi,

    For samWrapper function, the output file contains some columns related to the T statistic, such as score(d) for the T-statistic value, numerator(r) for the numerator of the T-statistic, and denominator(s + s0) for the denominator of the T-statistic. For more details, please find in the Section 12.2 of sam manual.


    The Signature column indicates each gene is differentially expressed or not. If it is a "TURE", the gene can be either upregulated or downregulated. The default foldchange = 2 including both cases. I.e., including foldchange > 2 and foldchange < 0.5. You validate it easily by comparing the two columns "Fold Change" and "Signature".

    What do you mean by "the genes that are common"? I think it si no need to do this filtering. Also, you can compare the results, and pick up the difference (genes appear in either results), and analyze what cause the difference. If you could show me an example, I may give you some clue. Thanks.

    Leave a comment:


  • m!x
    replied
    I successfully ran the SamWrapper, but would appreciate further explanation about the output columns.

    What do the "numerator" and "denominator" columns represent?
    How do I tell whether the genes are upregulated or downregulated?
    eg. if I set the min. foldchange = 2, will this include the genes that are downregulated by 2 times?

    I also got different result when I used "filtered" data as an input.
    First, I used "getGeneExp" to get the raw read counts for each gene.
    From the output, I removed all genes that have less than 2 reads. Then, I filtered for the genes that are common between the biological replicates for each sample.
    I used this filtered set as the input for samWrapper.
    Is this a valid approach?
    Or, should I have used the unfiltered set?

    Thank you!

    Leave a comment:


  • qqsvery
    replied
    Originally posted by svl View Post
    I seem to be unable to install the package...anyone had succes?
    ----
    source("http://bioconductor.org/biocLite.R")
    biocLite("DEGseq")
    ----

    Also their site is unavailable: http://bioinfo.au.tsinghua.edu.cn/software/degseq
    Here is the page at Bioc: http://www.bioconductor.org/packages...ml/DEGseq.html
    Thanks for your information! It`s usefull!

    Leave a comment:


  • Xi Wang
    replied
    Hi Steffen,

    Thanks for using DEGseq.

    First I tried the method "CTR" to check the variation between the replicates, but I couldn't find a code example for this method in package material.
    Actually, the "CTR" expample code is similar to the example for DEGexp function. The only modification is to specify: method="CTR". So, a code example is:
    Code:
    DEGexp(geneExpFile1=geneExpFile, geneCol1=1, expCol1=2, groupLabel1="R1", geneExpFile2=geneExpFile, geneCol2=1, expCol2=3, groupLabel2="R2",
    method="CTR", outputDir=outputDir)
    Note: geneExpFile contains the expression values for the two replicates, where gene names are listed in column 1, expresssion values for replicate 1listed in column 2, and expresssion values for replicate 2 listed in column 3.


    On the last output plot produced by the method "CTR" one can see the difference between the standard deviation of M according to the RSM and the theoretical four-fold local standard deviation of M by the comparison of technical replicates. But what does it mean when there is a distance between these to lines (read and blue)? Can I use the method "MATR" which is based on technical replicates anyway?
    This phenomenon means the two replicates do not match well. Yes, you can use MATR method anyway. Besides, you may also use other methods to get the corresponding results. An extra validation step should be done (if feasible) and then you can jude which method is better.

    Because I have these 4 datasets per condition, I wanted to uses them in the correct way, not simple adding the raw counts of each gene. Which method would you propose in this case and how should the correct code of the function DEGexp(..?..)
    look like?
    There is another function "samWrapper" between two samples with biological replicates. You can try this on your 4 datasets. But, theoretically, the technical replicates cannot be treated as biological replicates.

    Leave a comment:


  • steffenp
    replied
    Hi,
    I wanted to use DEGseq to identify differentially expressed genes between wildtype and mutant experiments. I have 4 datasets for each condition (WT,mutant) : 2 biological replicates and for each of them 2 technical replicates.

    First I tried the method "CTR" to check the variation between the replicates, but I couldn't find a code example for this method in package material. On the last output plot produced by the method "CTR" one can see the difference between the standard deviation of M according to the RSM and the theoretical four-fold local standard deviation of M by the comparison of technical replicates. But what does it mean when there is a distance between these to lines (read and blue)? Can I use the method "MATR" which is based on technical replicates anyway?

    Because I have these 4 datasets per condition, I wanted to uses them in the correct way, not simple adding the raw counts of each gene. Which method would you propose in this case and how should the correct code of the function DEGexp(..?..)
    look like?

    Many thanks for your help!
    Steffen

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by m!x View Post
    Hi,

    I am trying to use samWrapper to analyze my RNA-seq data on Mac OS X.
    Is there a simple way to specify the path to the files?

    I noticed that you can use the following on Windows:
    >geneExpFile <- "D:/data/sample1.txt"

    Thanks!
    It's not very difficult. For example:
    Code:
    >geneExpFile <- "/PATH/TO/FILE"
    but you should know where you file is. maybe "pwd" command can help you.

    Leave a comment:


  • m!x
    replied
    Hi,

    I am trying to use samWrapper to analyze my RNA-seq data on Mac OS X.
    Is there a simple way to specify the path to the files?

    I noticed that you can use the following on Windows:
    >geneExpFile <- "D:/data/sample1.txt"

    Thanks!

    Leave a comment:


  • maria.b
    replied
    Hello,

    Finally it ends during this night.(maybe I don't wait enough patiently) I will recalculate the expression values like you said, and I will tell you if it work's better.

    Thanks for your advices

    Maria
    Last edited by maria.b; 02-17-2010, 01:01 AM.

    Leave a comment:


  • Xi Wang
    replied
    Hi, Maria

    Maybe the figures are too large for DEGseq to calculate. But I need to check if it is the reason, or it's a bug of DEGseq.

    From how we model the RNA-seq data, we strongly recommend you use the read counts (instead of the sum of read counts on every base) as the gene expression level estimate. You can just simply try DEGseq function in the package.

    Thanks,

    Leave a comment:


  • maria.b
    replied
    Ok thanks for your reply,

    I have three values per gene:
    - the sum of read on each base (count)
    - the average coverage on each base(mean)
    - the expression value in RPKM (rpkm) (rpkm = count * 10⁹ /length*nbreads) (maybe this is false, i will change my calculation of count to have the number of reads mapped per gene, but it's an other problem)

    I have 13661 genes and three sample in 2 replicats
    Here you have the value min and max for each replicat and for each type of expression value (I don'tif it's important)
    count values :
    's1_1': [0, 5983478],
    's1_2': [0, 17697854],
    's2_2': [0, 14879008],
    's2_1': [0, 14369451],
    's3_2': [0, 11717714],
    's3_1': [0, 11696411]

    mean_values:
    's1_1': [0.0, 5942.0],
    's1_2': [0.0, 65791.0],
    's2_2': [0.0, 14776.0],
    's2_1': [0.0, 14270.0],
    's3_2': [0.0, 11636.0],
    's3_1': [0.0, 11615.0]

    rpkm values:
    's1_1': [0.0, 1075393.0],
    's1_2': [0.0, 4577530.0],
    's2_2': [0.0, 1072475.0],
    's2_1': [0.0, 1145468.0],
    's3_2': [0.0, 1064802.0],
    's3_1': [0.0, 869385.0]

    I want to run the FET method to compare S1 and S2, S1 and S3 using the different expression value type.
    Is the command DEGexp() different when we use the method MARS than when we use the method FET?

    My output looks like this (exemple comparing s1 and s2):

    #############analyse differentielle, methode FET, s2 vs s1, count#############
    Please wait...

    geneExpFile1: fileEXPR
    gene id column in geneExpFile1: 1
    expression value column(s) in geneExpFile1: 9 10
    total number of reads uniquely mapped to genome obtained from sample1: 468022379 515938474

    geneExpFile2: fileEXPR
    gene id column in geneExpFile2: 1
    expression value column(s) in geneExpFile2: 7 8
    total number of reads uniquely mapped to genome obtained from sample2: 204283936 231306719

    method to identify differentially expressed genes: FET
    pValue threshold: 0.001
    output directory: out

    Please wait ...
    Identifying differentially expressed genes ...
    Please wait patiently ...

    and it never ends.

    Thanks for your help.

    Maria
    Last edited by maria.b; 02-16-2010, 08:56 AM.

    Leave a comment:


  • Xi Wang
    replied
    Hi, Maria

    Originally posted by maria.b View Post
    Hi everybody,
    I'm using DEGseq to identify gene differentially expressed genes from expression values that I already have.
    Thanks for using DEGseq.

    Originally posted by maria.b View Post
    I would like to know how many time does it takes to run the DEGexp function with FET method. Because I recieve the result for the LRT and MARS method in a few minutes and for the FET method I let it run more than one night and it was still running. Is it normal?
    What is you data size? I don't think it is normal primarily. But we need to confirm what caused this time consuming problem.

    Originally posted by maria.b View Post
    I have an other question concerning the expression value. For the moment I calculate these values like the sum of reads on each base of a gene and not the number of reads mapped on the gene and next I transform these values in RPKM. Do you think that it will change anything in the differentially expressed genes analysis? What do you use to calculate thiss expression values?
    The values by your means roughly equals to (read count) * (read length) * 10⁹ / (gene length) / (total reads) = RPKM * (read length)
    It is ok to use you method, but when counting RPKM, you need divide the values by the read length, further.
    Last edited by Xi Wang; 02-16-2010, 06:54 PM. Reason: a typo corrected

    Leave a comment:


  • maria.b
    replied
    FET method

    Hi everybody,

    I'm using DEGseq to identify gene differentially expressed genes from expression values that I already have.

    I would like to know how many time does it takes to run the DEGexp function with FET method. Because I recieve the result for the LRT and MARS method in a few minutes and for the FET method I let it run more than one night and it was still running. Is it normal?

    I have an other question concerning the expression value. For the moment I calculate these values like the sum of reads on each base of a gene and not the number of reads mapped on the gene and next I transform these values in RPKM. Do you think that it will change anything in the differentially expressed genes analysis? What do you use to calculate thiss expression values?

    Thanks for you help

    Maria

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by AmyL View Post
    Hi,

    I was wondering what density is a measure of in the first output graph of DEGseq,

    thanks,
    Amy
    The plot is generated by:

    Code:
    hist(LogVal(Sample1),main=label1,xlab="log2(Number of reads mapped to a gene)",col=4,breaks=100,freq=FALSE,ylim=c(0,0.5))
    Using "freq=FALSE" means, component density are plotted, so that the histogram has a total area of one: sum(density * bin_width) = 1

    Leave a comment:


  • AmyL
    replied
    Hi,

    I was wondering what density is a measure of in the first output graph of DEGseq,

    thanks,
    Amy

    Leave a comment:


  • Xi Wang
    replied
    Hi lix,

    Originally posted by lix View Post
    My mapped reads are the "eland" format like this:
    26 CCTTTCCACATCTTTCTCCCTCGCT U1 0 1 1 chr12 81865484 R

    So, my data should convert to the "eland" format that DEGseq supports like this:
    26 CCTTTCCACATCTTTCTCCCTCGCT 81865484 U1 R

    I'm just wondering whether my conversion was right.
    I am wondering how you convert the format. If you used a script to implement the conversion, you can check the result after conversion directly. Certainly, you need to make this step work.

    BTW, after I used the getGeneExp() function, if all of the RPKM values in the expression value files are "0", does it mean that the DEGexp() will fail to read the expCol1 or expCol2 value?
    Sure, even if DEGexp() successfully reads the values, the values are all equal to 0.

    And, is there any difference between the "valCol" in readGeneExp() and the "expCol" in DEGexp()?
    You can take they the same. But "valCol" could be any col while "expCol" should only be expression cols.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-25-2024, 11:49 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Working...
X