Hi RSK,
The default FDR used for the Benjamini or the Storey correction is 0.001. The documents for DEGseq can be found at:
Thanks.
Unconfigured Ad
Collapse
X
-
Default FDR
Hi Xi,
What is the default FDR % used for the Benjamini and Storey corrections for DEGexp? Also where can I find the vignette for the DEGseq package that member Simon Anders refers to in another post?
Thanks,
RSK
Leave a comment:
-
-
Hi m!x,Originally posted by m!x View PostHi Xi,
I got a question regarding the p-value generated by MARS method. Is it corrected for multiple comparisons by any methods, eg. Bonferroni correction?
I have zero background about this statistical analysis and would appreciate if you could give some insight into this.
Actually, in our package, for the methods MARS, FET, LRT, and MATR, both p-values and q-values are calculated. Q-value is the corrected p-value with respect to the multiple testing. Two kinds of q-values are given in DEGseq package, one is given by Benjamini correction, and the other by Storey correction. The two methods take different aspects for correction. The Benjamini correction gives more false negatives, while the Storey correction more false positives.
More insight of multiple testing correction, please refer to http://en.wikipedia.org/wiki/Multiple_comparisons
Leave a comment:
-
-
Hi Xi,
I got a question regarding the p-value generated by MARS method. Is it corrected for multiple comparisons by any methods, eg. Bonferroni correction?
I have zero background about this statistical analysis and would appreciate if you could give some insight into this.
Leave a comment:
-
-
Sorry for the late reply.Originally posted by gen2prot View PostHi Xi,
Thank you for the explanation. I'll try plotting the p-values. Another question. I have the absolute read counts for 4 male and 4 female drosophila flies. I tried both the MARS and the FET method on them. The results are similar (the FET method catches maybe 50-100 more genes than MARS) Is there a reason why u would use the MARS method over FET or vice-versa?
Thanks
Abhijit
Its quite right that those methods give different results. This is because different methods build on different statistical models, and will reflect different aspects of statistical significance. The p-value derived from different methods are not comparable. If the gene ranks are the same, the tradeoff between sensitivity and specificity will be the same between two methods. Then only thing he needs to do it choose the appropriate threshold based on the empirical distribution of p-value.
Leave a comment:
-
-
Hi Xi,
Thank you for the explanation. I'll try plotting the p-values. Another question. I have the absolute read counts for 4 male and 4 female drosophila flies. I tried both the MARS and the FET method on them. The results are similar (the FET method catches maybe 50-100 more genes than MARS) Is there a reason why u would use the MARS method over FET or vice-versa?
Thanks
Abhijit
Leave a comment:
-
-
Number of absolute difference can not reflect the significance of difference expressions levels. For example, suppose we have two genes and measured their expression levels at two conditions using read count, one is 5001:1 and the other is 200,000:205,000. In this case, both genes have the same difference of read count, but we are more confident that the first gene is expressed differentially than the second. So p-value can give you a rank the extend that observed difference read counts deviate from expectation if gene is not differentially expressed. As for your second question, you should first plot the distribution of p-values, and choose an appropriate cut-off, and remove genes fall below the chosen significance level.Originally posted by gen2prot View PostHi Xi,
I am using the DEGseq package and the DEGexp utility to find differentially expressed genes between male and female drosophila flies. If I leave the default values and use the MARS method or the FET method, I get a almost 7000 genes which are differentially expressed. I found that genes with a read count difference of 5000 has similar p-values to genes with a difference of 300. Is there a way to select only the most significant of these differences? How is the threshold for gene expression difference decided. I intend to remove from my dataset all genes (as much as computationally possible) that are sex-associated and I am not sure if 7000 genes qualify as sex-associated.
Thanks
Abhijit
Hope this helps.
Leave a comment:
-
-
Hi Xi,
I am using the DEGseq package and the DEGexp utility to find differentially expressed genes between male and female drosophila flies. If I leave the default values and use the MARS method or the FET method, I get a almost 7000 genes which are differentially expressed. I found that genes with a read count difference of 5000 has similar p-values to genes with a difference of 300. Is there a way to select only the most significant of these differences? How is the threshold for gene expression difference decided. I intend to remove from my dataset all genes (as much as computationally possible) that are sex-associated and I am not sure if 7000 genes qualify as sex-associated.
Thanks
Abhijit
Leave a comment:
-
-
Alright, I have fixed the problem! Thanks very much for all your help Xi.Originally posted by Xi Wang View PostIt just because you can't use R display for figures. Have nothing to do with DEGseq itself. So the installation should be ok.
It is clear now that the formats are not recognized. I think it is related to the line field mark. There is something wrong when switching the Windows and Linux platform. Please try to use the Linux command
to check if the file dose appear in one line. I'll get back to you later if you still can't solve this problem.Code:cat /auto/cmb-01/ikremsky/mouse_liver/supershort.bed.txt
Leave a comment:
-
-
It just because you can't use R display for figures. Have nothing to do with DEGseq itself. So the installation should be ok.Originally posted by ikremsky View PostEver since I installed R version 2.10.0 a couple days ago in parallel with versoin 2.10.1, I have been getting the message: "In fun(...) : no DISPLAY variable so Tk is not available" when loading the DEGseq package. (I think the problem was caused when I attempted to uninstall R 2.10.1 by going into the head directory and typing "make uninstall"; it did not uninstall as I had expected it to, and afterward I decided to keep it installed. However, the message occurs when I use both version 2.10.0 and 2.10.1, so maybe that wasn't the cause afterall.) I'm not sure if that is the problem though, because I wasn't getting that message earlier but was still having the same exact problem.
It is clear now that the formats are not recognized. I think it is related to the line field mark. There is something wrong when switching the Windows and Linux platform. Please try to use the Linux commandOriginally posted by ikremsky View PostI tried running the first 4 lines like you suggested (I loaded the full files into MS Excel and then copied and pasted the top 4 lines into a new file and saved as tab-delimited text.) Here are my input and output messages:
getGeneExp("/auto/cmb-01/ikremsky/mouse_liver/supershort.bed.txt", refFlat = "/auto/cmb-01/ikremsky/mouse_liver/refFlat_mm9short.txt")
Please wait...
SampleFiles:
/auto/cmb-01/ikremsky/mouse_liver/supershort.bed.txt
Count the number of reads mapped to each gene.
This will take several minutes.
Please wait ...
total 1 unique genes
processed 0 reads (supershort.bed.txt)Wrong Format!
There is something wrong!Please check...
to check if the file dose appear in one line. I'll get back to you later if you still can't solve this problem.Code:cat /auto/cmb-01/ikremsky/mouse_liver/supershort.bed.txt
Leave a comment:
-
-
Ever since I installed R version 2.10.0 a couple days ago in parallel with versoin 2.10.1, I have been getting the message: "In fun(...) : no DISPLAY variable so Tk is not available" when loading the DEGseq package. (I think the problem was caused when I attempted to uninstall R 2.10.1 by going into the head directory and typing "make uninstall"; it did not uninstall as I had expected it to, and afterward I decided to keep it installed. However, the message occurs when I use both version 2.10.0 and 2.10.1, so maybe that wasn't the cause afterall.) I'm not sure if that is the problem though, because I wasn't getting that message earlier but was still having the same exact problem.Originally posted by Xi Wang View PostIt works well on my machine. Did you try the 4-line files on your cluster? (copying the 4 lines directly from the web does not work, as TAB separator is required in the input files.)
And how about the example in the manual for getGeneExp? Was there any warning message when you install DEGseq package?
I tried running the first 4 lines like you suggested (I loaded the full files into MS Excel and then copied and pasted the top 4 lines into a new file and saved as tab-delimited text.) Here are my input and output messages:
getGeneExp("/auto/cmb-01/ikremsky/mouse_liver/supershort.bed.txt", refFlat = "/auto/cmb-01/ikremsky/mouse_liver/refFlat_mm9short.txt")
Please wait...
SampleFiles:
/auto/cmb-01/ikremsky/mouse_liver/supershort.bed.txt
Count the number of reads mapped to each gene.
This will take several minutes.
Please wait ...
total 1 unique genes
processed 0 reads (supershort.bed.txt)Wrong Format!
There is something wrong!Please check...Last edited by ikremsky; 04-03-2010, 01:26 PM.
Leave a comment:
-
-
Originally posted by ikremsky View PostThere are a little over 11 million reads.
Here are the first few lines of the bed file:
chr7 33062200 33062233 SRR001357.7181352 0 +
chr7 33062200 33062233 SRR001357.7208189 0 +
chr4 91653375 91653408 SRR001357.758827 0 -
chrX 47443645 47443678 SRR001357.1304368 0 +
and the refFlat file:
Xkr4 NM_001011874 chr1 - 3204562 3661579 3206102 3661429 3 3204562,3411782,3660632, 3207049,3411982,3661579,
Rp1 NM_011283 chr1 - 4334223 4350473 4334680 4342906 4 4334223,4341990,4342282,4350280, 4340172,4342162,4342918,4350473,
Sox17 NM_011441 chr1 - 4481008 4486494 4481796 4483487 5 4481008,4483180,4483852,4485216,4486371, 4482749,4483547,4483944,4486023,4486494,
Mrpl15 NM_025300 chr1 - 4764014 4775768 4764532 4775758 5 4764014,4767605,4772648,4774031,4775653, 4764597,4767729,4772814,4774186,4775768,
It works well on my machine. Did you try the 4-line files on your cluster? (copying the 4 lines directly from the web does not work, as TAB separator is required in the input files.)
And how about the example in the manual for getGeneExp? Was there any warning message when you install DEGseq package?
Leave a comment:
-
-
There are a little over 11 million reads.Originally posted by Xi Wang View PostHow many reads are there in your bed file? DEGseq can deal with millions of reads up on our test. Actually, C/C++ code is embeded in DEGseq to deal with read-counting. It will be much faster than R.
Could you please paste a few lines of your BED file and your REFFLAT file here, or email to me ([email protected])? I will look into them and test them on my machine. Thanks.
Here are the first few lines of the bed file:
chr7 33062200 33062233 SRR001357.7181352 0 +
chr7 33062200 33062233 SRR001357.7208189 0 +
chr4 91653375 91653408 SRR001357.758827 0 -
chrX 47443645 47443678 SRR001357.1304368 0 +
and the refFlat file:
Xkr4 NM_001011874 chr1 - 3204562 3661579 3206102 3661429 3 3204562,3411782,3660632, 3207049,3411982,3661579,
Rp1 NM_011283 chr1 - 4334223 4350473 4334680 4342906 4 4334223,4341990,4342282,4350280, 4340172,4342162,4342918,4350473,
Sox17 NM_011441 chr1 - 4481008 4486494 4481796 4483487 5 4481008,4483180,4483852,4485216,4486371, 4482749,4483547,4483944,4486023,4486494,
Mrpl15 NM_025300 chr1 - 4764014 4775768 4764532 4775758 5 4764014,4767605,4772648,4774031,4775653, 4764597,4767729,4772814,4774186,4775768,Last edited by ikremsky; 04-02-2010, 07:07 PM.
Leave a comment:
-
-
How many reads are there in your bed file? DEGseq can deal with millions of reads up on our test. Actually, C/C++ code is embeded in DEGseq to deal with read-counting. It will be much faster than R.Originally posted by ikremsky View PostI am starting to think the problem has to do with the large file size. I tried simply scanning the bed file into R using function read.delim(), but it took quite about 20 minutes. Accessing parts of the table was also slow.
What is the best way to deal with very large bed files with millions of reads? R seems to be inherently very slow in dealing with this type of file.
Thanks!
Could you please paste a few lines of your BED file and your REFFLAT file here, or email to me ([email protected])? I will look into them and test them on my machine. Thanks.
Leave a comment:
-
-
I think it has nothing to do with the read ids. The numbers are just related to the sequencing platform.Originally posted by ikremsky View PostOk I reinstalled DEGseq. I was able to run the example, but I am still unable to run it on the files I have produced. I noticed that on the readID section in your sample bed file, that all id's have the '>' symbol before the id, whereas this symbol is not in the bed files I have produced. For example, the entry in your sample file looks like ">CMLIVERKIDNEY_7:1:1:67:406" (without quotes) whereas in the file I've created, they look like "SRR001357.7181352". Could that be the problem? What do those numbers mean on your sample id's?
Thanks!
Leave a comment:
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism
by SEQadmin2
Started by SEQadmin2, 06-09-2026, 11:58 AM
|
0 responses
25 views
0 reactions
|
Last Post
by SEQadmin2
06-09-2026, 11:58 AM
|
||
|
Started by SEQadmin2, 06-05-2026, 10:09 AM
|
0 responses
30 views
0 reactions
|
Last Post
by SEQadmin2
06-05-2026, 10:09 AM
|
||
|
Started by SEQadmin2, 06-04-2026, 08:59 AM
|
0 responses
39 views
0 reactions
|
Last Post
by SEQadmin2
06-04-2026, 08:59 AM
|
||
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
62 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
Leave a comment: