Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • lix
    replied
    Hi Xi,

    Thank you for your so kind reply.

    My mapped reads are the "eland" format like this:
    26 CCTTTCCACATCTTTCTCCCTCGCT U1 0 1 1 chr12 81865484 R

    So, my data should convert to the "eland" format that DEGseq supports like this:
    26 CCTTTCCACATCTTTCTCCCTCGCT 81865484 U1 R

    I'm just wondering whether my conversion was right.

    BTW, after I used the getGeneExp() function, if all of the RPKM values in the expression value files are "0", does it mean that the DEGexp() will fail to read the expCol1 or expCol2 value?
    And, is there any difference between the "valCol" in readGeneExp() and the "expCol" in DEGexp()?

    Thanks.

    lix

    Leave a comment:


  • Xi Wang
    replied
    Sorry for the late reply. I was just back home.

    You can try the following script. If the problem remains, just let me know.

    Code:
    library(DEGseq)
    sample1 <- "D:/data/sample1.txt"
    sample2 <- "D:/data/sample2.txt"
    refFlat <- "D:/data/refFlat.txt"
    mapResultBatch1 <- c(sample1)
    mapResultBatch2 <- c(sample2)
    outputDir <- "D:/data/DEGseqExample"
    DEGseq(mapResultBatch1, mapResultBatch2, fileFormat = "eland",refFlat = refFlat, outputDir = outputDir, method = "LRT")
    if you suspect the eland format problem, show a few lines of your eland file. thanks.

    Leave a comment:


  • lix
    replied
    DEGseq err

    Hi Xi,

    I'm trying to use DEGseq and my RNA-seq reads have already mapped. The results are the "eland" format. Does DEGseq support this kind of format?

    I just follow these commands as the manual metioned:

    library(DEGseq)
    sample1 <- system.file("data","D:/data/sample1.txt",package = "DEGseq")
    sample2 <- system.file("data","D:/data/sample2.txt",package = "DEGseq")
    refFlat <- system.file("data","D:/data/refFlat.txt",package = "DEGseq")
    mapResultBatch1 <- c(sample1)
    mapResultBatch2 <- c(sample2)
    outputDir <- file.path(tempdir(),"D:/data/DEGseqExample")
    DEGseq(mapResultBatch1, mapResultBatch2, fileFormat = "eland",refFlat = refFlat, outputDir = outputDir, method = "LRT")

    However, it reported:

    Loading required package: qvalue
    Loading required package: tcltk
    Loading Tcl/Tk interface ... done
    Loading required package: samr
    Loading required package: impute
    Error in file(file, "r") : cannot open the connection
    Calls: DEGseq -> DEGexp -> read.table -> file
    In addition: Warning message:
    In file(file, "r") :
    cannot open file 'D:/data/DEGseqExample/group1.exp': No such file or directory
    Execution halted


    Is my CMD all right? Any kind of suggetion will be appreciated.
    Thanks.

    lix
    Last edited by lix; 01-28-2010, 04:55 AM.

    Leave a comment:


  • Xi Wang
    replied
    Hi Abhi,

    Thanks for your questions.

    Originally posted by apratap View Post
    While I understand if we ignore the isoforms of the same gene, normalizing for the gene len wont make a difference. I am not sure why we should not normalize for the difference in reads counts. Also I am not able to generate the RPKM result using the read count /gene length. Could you please explain.
    Yes, if the isoforms between samples are changed, we should count all the reads mapping to those isoforms and then normalize against the isoform lengths, respectively. However, our DEGseq package is highly dependent on the annotated gene model, and is not able to infer (novel) isoforms with isoform expression levels at the current stage. We count the read number for each gene only the read mapped to gene exons based on gene annotation. In other words, we only take the reads from the same region with the same length for comparison.

    Originally posted by apratap View Post
    1. why we are not accounting for diff read numbers in the output_score.txt
    Could you please show me the line for the gene you mentioned in the output_score.txt and what command you used to generate the output_score.txt? Thanks.
    Originally posted by apratap View Post
    2. How is the RPKM being calculated here. Somehow I am not able to find the same number shown in the example taken above.
    We calculate RPKM following its definition: reads per kilo-base perl million reads. Take your control data for example: RPKM = 811 / 87082676 / 5930 * 1e+9 = 1.57049.

    Hope this helps. If further questions, just let me know. I am glad to solve all the problems.

    Best,
    Xi

    Leave a comment:


  • apratap
    replied
    Hi Xi

    Thanks for putting your work in the open source. I am quite happy to see many different methods at my hand to play around with numbers.

    I think I have some repeated questions that have been asked on this thread in some way or the other. I may be getting down to very basics and hope that will help some others also.

    As a pilot DEgseq usage run, I used the LRT method to find expression values for two samples(control/mutant). I am taking one specific gene count example to ask my questions. The input was bed format, alignment was done using TopHat.

    HTML Code:
    Control
    
    Gene    Count      RPKM         total no. reads      gene length
    ABAT    811         1.57049      87082676          5930
    
    Mutant
    ABAT     4264      10.3046        69779755           5930
    While I understand if we ignore the isoforms of the same gene, normalizing for the gene len wont make a difference. I am not sure why we should not normalize for the difference in reads counts. Also I am not able to generate the RPKM result using the read count /gene length. Could you please explain.

    1. why we are not accounting for diff read numbers in the output_score.txt
    2. How is the RPKM being calculated here. Somehow I am not able to find the same number shown in the example taken above.

    Thanks!
    -Abhi

    Leave a comment:


  • ngcrawford
    replied
    Thanks Likun,

    I was confused by the q-value. It all makes sense now.

    - Nick

    Leave a comment:


  • Likun Wang
    replied
    Originally posted by Xi Wang View Post
    AmyL,

    Thanks for your question.
    If you only care the fold changes, you can use a normalization as you mentioned. Or, in DEGseq, you can use the option normalMethod="median".

    Xi
    BTW: For the fold changes you can do normalization as you methioned or use the option normalMethod="median". But for the methods "LRT", "FET(fisher's exact test)" and "MARS", the row count and normalMethod="none" are recommended for your example. .

    Leave a comment:


  • Likun Wang
    replied
    Originally posted by ngcrawford View Post
    How do you set it? It's mentioned in the paper, but I can only find ways to adjust the p-value cut-off.

    Thanks in advance.

    - Nick
    Hi ngcrawford and Xi,
    I think ngcrawford want to find a way to set the fdr cut-off.
    The following is an example to set it.
    DEGexp(geneExpFile1=geneExpFile,geneExpFile2=geneExpFile2,thresholdKind=4,qValue=0.001)
    Please type ?DEGexp for detail.
    ---------------
    Likun

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by AmyL View Post
    Question:

    I am attempting to analyze samples that do not have the same number of reads. For example, one has 800K and another has 1.3M. With the analysis from DEGseq, it is obvious that the fold changes between samples are due to the difference in total read numbers. With this particular example, would you recommend using a reads/million normalization?

    Thanks in advance!
    AmyL,

    Thanks for your question.
    If you only care the fold changes, you can use a normalization as you mentioned. Or, in DEGseq, you can use the option normalMethod="median".

    Xi

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by ngcrawford View Post
    How do you set it? It's mentioned in the paper, but I can only find ways to adjust the p-value cut-off.

    Thanks in advance.

    - Nick
    Sorry, I cannot catch what your meaning. What "it" refers to? Thanks.

    Xi

    Leave a comment:


  • AmyL
    replied
    Question:

    I am attempting to analyze samples that do not have the same number of reads. For example, one has 800K and another has 1.3M. With the analysis from DEGseq, it is obvious that the fold changes between samples are due to the difference in total read numbers. With this particular example, would you recommend using a reads/million normalization?

    Thanks in advance!

    Leave a comment:


  • ngcrawford
    replied
    Fdr?

    How do you set it? It's mentioned in the paper, but I can only find ways to adjust the p-value cut-off.

    Thanks in advance.

    - Nick

    Leave a comment:


  • ngcrawford
    replied
    Xi,

    That worked like a charm. Thanks!

    - Nick

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by ngcrawford View Post
    DEGseq look really nice, but I'm having trouble getting my data file read. Do I just need to substitute:



    with



    and then run DEGexp(commands)?

    I'm getting the following error:



    Thanks in advance,
    Nick
    Nick,

    You can specify the gene expression file in this way:

    Suppose the file path is "D:/data/MyData.txt" (windows platform), then
    Code:
    geneExpFile <- "D:/data/MyData.txt"
    Xi

    Leave a comment:


  • ngcrawford
    replied
    I can't get DEGseq to run my data

    DEGseq look really nice, but I'm having trouble getting my data file read. Do I just need to substitute:

    >geneExpFile <- system.file("data", "GeneExpExample5000.txt", package = "DEGseq")
    with

    >geneExpFile <- system.file("data", "MyData.txt", package = "DEGseq")
    and then run DEGexp(commands)?

    I'm getting the following error:

    Error in read.table(geneExpFile1, header = header, sep = sep) :
    no lines available in input
    In addition: Warning message:
    In file(file, "rt") :
    file("") only supports open = "w+" and open = "w+b": using the former
    Thanks in advance,
    Nick

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-25-2024, 11:49 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Working...
X