Hi Xi,
Thank you for your so kind reply.
My mapped reads are the "eland" format like this:
26 CCTTTCCACATCTTTCTCCCTCGCT U1 0 1 1 chr12 81865484 R
So, my data should convert to the "eland" format that DEGseq supports like this:
26 CCTTTCCACATCTTTCTCCCTCGCT 81865484 U1 R
I'm just wondering whether my conversion was right.
BTW, after I used the getGeneExp() function, if all of the RPKM values in the expression value files are "0", does it mean that the DEGexp() will fail to read the expCol1 or expCol2 value?
And, is there any difference between the "valCol" in readGeneExp() and the "expCol" in DEGexp()?
Thanks.
lix
Unconfigured Ad
Collapse
X
-
Sorry for the late reply. I was just back home.
You can try the following script. If the problem remains, just let me know.
if you suspect the eland format problem, show a few lines of your eland file. thanks.Code:library(DEGseq) sample1 <- "D:/data/sample1.txt" sample2 <- "D:/data/sample2.txt" refFlat <- "D:/data/refFlat.txt" mapResultBatch1 <- c(sample1) mapResultBatch2 <- c(sample2) outputDir <- "D:/data/DEGseqExample" DEGseq(mapResultBatch1, mapResultBatch2, fileFormat = "eland",refFlat = refFlat, outputDir = outputDir, method = "LRT")
Leave a comment:
-
-
DEGseq err
Hi Xi,
I'm trying to use DEGseq and my RNA-seq reads have already mapped. The results are the "eland" format. Does DEGseq support this kind of format?
I just follow these commands as the manual metioned:
library(DEGseq)
sample1 <- system.file("data","D:/data/sample1.txt",package = "DEGseq")
sample2 <- system.file("data","D:/data/sample2.txt",package = "DEGseq")
refFlat <- system.file("data","D:/data/refFlat.txt",package = "DEGseq")
mapResultBatch1 <- c(sample1)
mapResultBatch2 <- c(sample2)
outputDir <- file.path(tempdir(),"D:/data/DEGseqExample")
DEGseq(mapResultBatch1, mapResultBatch2, fileFormat = "eland",refFlat = refFlat, outputDir = outputDir, method = "LRT")
However, it reported:
Loading required package: qvalue
Loading required package: tcltk
Loading Tcl/Tk interface ... done
Loading required package: samr
Loading required package: impute
Error in file(file, "r") : cannot open the connection
Calls: DEGseq -> DEGexp -> read.table -> file
In addition: Warning message:
In file(file, "r") :
cannot open file 'D:/data/DEGseqExample/group1.exp': No such file or directory
Execution halted
Is my CMD all right? Any kind of suggetion will be appreciated.
Thanks.
lixLast edited by lix; 01-28-2010, 04:55 AM.
Leave a comment:
-
-
Hi Abhi,
Thanks for your questions.
Yes, if the isoforms between samples are changed, we should count all the reads mapping to those isoforms and then normalize against the isoform lengths, respectively. However, our DEGseq package is highly dependent on the annotated gene model, and is not able to infer (novel) isoforms with isoform expression levels at the current stage. We count the read number for each gene only the read mapped to gene exons based on gene annotation. In other words, we only take the reads from the same region with the same length for comparison.Originally posted by apratap View PostWhile I understand if we ignore the isoforms of the same gene, normalizing for the gene len wont make a difference. I am not sure why we should not normalize for the difference in reads counts. Also I am not able to generate the RPKM result using the read count /gene length. Could you please explain.
Could you please show me the line for the gene you mentioned in the output_score.txt and what command you used to generate the output_score.txt? Thanks.Originally posted by apratap View Post1. why we are not accounting for diff read numbers in the output_score.txt
We calculate RPKM following its definition: reads per kilo-base perl million reads. Take your control data for example: RPKM = 811 / 87082676 / 5930 * 1e+9 = 1.57049.Originally posted by apratap View Post2. How is the RPKM being calculated here. Somehow I am not able to find the same number shown in the example taken above.
Hope this helps. If further questions, just let me know. I am glad to solve all the problems.
Best,
Xi
Leave a comment:
-
-
Hi Xi
Thanks for putting your work in the open source. I am quite happy to see many different methods at my hand to play around with numbers.
I think I have some repeated questions that have been asked on this thread in some way or the other. I may be getting down to very basics and hope that will help some others also.
As a pilot DEgseq usage run, I used the LRT method to find expression values for two samples(control/mutant). I am taking one specific gene count example to ask my questions. The input was bed format, alignment was done using TopHat.
While I understand if we ignore the isoforms of the same gene, normalizing for the gene len wont make a difference. I am not sure why we should not normalize for the difference in reads counts. Also I am not able to generate the RPKM result using the read count /gene length. Could you please explain.HTML Code:Control Gene Count RPKM total no. reads gene length ABAT 811 1.57049 87082676 5930 Mutant ABAT 4264 10.3046 69779755 5930
1. why we are not accounting for diff read numbers in the output_score.txt
2. How is the RPKM being calculated here. Somehow I am not able to find the same number shown in the example taken above.
Thanks!
-Abhi
Leave a comment:
-
-
Thanks Likun,
I was confused by the q-value. It all makes sense now.
- Nick
Leave a comment:
-
-
BTW: For the fold changes you can do normalization as you methioned or use the option normalMethod="median". But for the methods "LRT", "FET(fisher's exact test)" and "MARS", the row count and normalMethod="none" are recommended for your example.Originally posted by Xi Wang View PostAmyL,
Thanks for your question.
If you only care the fold changes, you can use a normalization as you mentioned. Or, in DEGseq, you can use the option normalMethod="median".
Xi
.
Leave a comment:
-
-
Hi ngcrawford and Xi,Originally posted by ngcrawford View PostHow do you set it? It's mentioned in the paper, but I can only find ways to adjust the p-value cut-off.
Thanks in advance.
- Nick
I think ngcrawford want to find a way to set the fdr cut-off.
The following is an example to set it.
DEGexp(geneExpFile1=geneExpFile,geneExpFile2=geneExpFile2,thresholdKind=4,qValue=0.001)
Please type ?DEGexp for detail.
---------------
Likun
Leave a comment:
-
-
AmyL,Originally posted by AmyL View PostQuestion:
I am attempting to analyze samples that do not have the same number of reads. For example, one has 800K and another has 1.3M. With the analysis from DEGseq, it is obvious that the fold changes between samples are due to the difference in total read numbers. With this particular example, would you recommend using a reads/million normalization?
Thanks in advance!
Thanks for your question.
If you only care the fold changes, you can use a normalization as you mentioned. Or, in DEGseq, you can use the option normalMethod="median".
Xi
Leave a comment:
-
-
Question:
I am attempting to analyze samples that do not have the same number of reads. For example, one has 800K and another has 1.3M. With the analysis from DEGseq, it is obvious that the fold changes between samples are due to the difference in total read numbers. With this particular example, would you recommend using a reads/million normalization?
Thanks in advance!
Leave a comment:
-
-
Fdr?
How do you set it? It's mentioned in the paper, but I can only find ways to adjust the p-value cut-off.
Thanks in advance.
- Nick
Leave a comment:
-
-
Nick,Originally posted by ngcrawford View PostDEGseq look really nice, but I'm having trouble getting my data file read. Do I just need to substitute:
with
and then run DEGexp(commands)?
I'm getting the following error:
Thanks in advance,
Nick
You can specify the gene expression file in this way:
Suppose the file path is "D:/data/MyData.txt" (windows platform), then
XiCode:geneExpFile <- "D:/data/MyData.txt"
Leave a comment:
-
-
I can't get DEGseq to run my data
DEGseq look really nice, but I'm having trouble getting my data file read. Do I just need to substitute:
with>geneExpFile <- system.file("data", "GeneExpExample5000.txt", package = "DEGseq")
and then run DEGexp(commands)?>geneExpFile <- system.file("data", "MyData.txt", package = "DEGseq")
I'm getting the following error:
Thanks in advance,Error in read.table(geneExpFile1, header = header, sep = sep) :
no lines available in input
In addition: Warning message:
In file(file, "rt") :
file("") only supports open = "w+" and open = "w+b": using the former
Nick
Leave a comment:
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, Today, 10:09 AM
|
0 responses
8 views
0 reactions
|
Last Post
by SEQadmin2
Today, 10:09 AM
|
||
|
Started by SEQadmin2, Yesterday, 08:59 AM
|
0 responses
14 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 08:59 AM
|
||
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
22 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
||
|
Started by SEQadmin2, 06-02-2026, 11:40 AM
|
0 responses
19 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 11:40 AM
|
Leave a comment: