Unconfigured Ad

**pr0t3us** · 05-31-2010, 08:07 AM

Hi there,

I've a problem with DEGseq command.... I'm using solid data and I have prepared all files....
I have this error

> DEGseq(MAP_SANI, MAP_NT, fileFormat = "bed",refFlat = REFFLAT, outputDir = outputDir, method = "LRT")
Please wait...

mapResultBatch1:
/local/database/HUMAN_GENOME_GRCh37/PROJECTS/TOMANIN_RNA/RESULTS/BED_FORMAT/sani_BED.txt
mapResultBatch2:
/local/database/HUMAN_GENOME_GRCh37/PROJECTS/TOMANIN_RNA/RESULTS/BED_FORMAT/NT_BED.txt
file format: bed
refFlat: /home/alex/PROGETTI/TOMANIN_RESULT/PROVA_DAG/Total_human_annotation.ucsc
Ignore the strand information when count the reads mapped to genes!
Count the number of reads mapped to each gene ...
This will take several minutes, please wait patiently!
Please wait...

SampleFiles:
/local/database/HUMAN_GENOME_GRCh37/PROJECTS/TOMANIN_RNA/RESULTS/BED_FORMAT/sani_BED.txt
Count the number of reads mapped to each gene.
This will take several minutes.
Please wait ...

*** caught segfault ***
address 0xfffffffffffffff8, cause 'memory not mapped'

Traceback:
1: .C(".getGeneExp", as.character(refFlat), as.character(mapResultBatch), as.integer(length(mapResultBatch)), as.character(output), as.character(fileFormat), as.integer(readLength), as.integer(needStrand), as.numeric(min.overlapPercent))
2: getGeneExp(mapResultBatch1, fileFormat, readLength, strandInfo, refFlat, GeneExp1)
3: DEGseq(MAP_SANI, MAP_NT, fileFormat = "bed", refFlat = REFFLAT, outputDir = outputDir, method = "LRT")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 1
aborting ...
Segmentation fault

Can someone help me ?!?

Alessandro

**Xi Wang** · 06-07-2010, 04:30 AM

Dear DEGseq users and DEGseq potential users,

Thank you for taking into account of using DEGseq. As the update of R platform, DEGseq also updates to its version 1.2.2, which changed quite a few from its version 1.1.x. Here, I’d like to suggest the users (1) download the most recent version with its “reference manual”, (2) following the examples in the manual, apply DEGseq’s functions to the test data, (3) replace the data set, and apply DEGseq to your own data. This procedure also gives a rapid way to diagnose commonly encountered mistakes: if step (2) works well and (3) not, please check the formats of the input files by comparing your files with the test files.

For further bugs or questions regarding the programming, please send emails to my colleague Likun Wang via [email protected] in a more direct and rapid manner. For further scientific questions or more general questions, you may leave messages here seeking for helps from the community.

Thanks again for your attention on DEGseq.

**m!x** · 06-09-2010, 11:43 AM

Hi,

Is the output for number of reads in each from getGeneExp function specific to the exons only? I used the refFlat format for the gene annotation.

Thanks!

**Xi Wang** · 06-09-2010, 06:23 PM

Originally posted by m!x View Post

Hi,

Is the output for number of reads in each from getGeneExp function specific to the exons only? I used the refFlat format for the gene annotation.

Thanks!

Yes, only reads in exons are counted. If you want to count reads in introns, make your own refFlat files.

**m!x** · 06-19-2010, 12:19 PM

Thanks for the quick reply. I tried to put the results of gene and number of reads in the UCSC custom track, and found out that there are a lot of reads that are mapping to 3' UTR when I viewed the track in genome browser. This becomes the problem when I started to do the subsequent statistical analysis.
There are a lot of genes that are said to be significantly induced/repressed, but actually most of the reads are in the 3' UTR, and almost none of the reads are in the exons.

Have you found this kind of phenomenon before? What do you suggest to correct this?

Thanks!

**Xi Wang** · 06-19-2010, 07:33 PM

Originally posted by m!x View Post

Thanks for the quick reply. I tried to put the results of gene and number of reads in the UCSC custom track, and found out that there are a lot of reads that are mapping to 3' UTR when I viewed the track in genome browser. This becomes the problem when I started to do the subsequent statistical analysis.
There are a lot of genes that are said to be significantly induced/repressed, but actually most of the reads are in the 3' UTR, and almost none of the reads are in the exons.

Have you found this kind of phenomenon before? What do you suggest to correct this?

Thanks!

Yes, we have observed the similar phenomenon in our data, that in the 3'UTR the read coverage was significantly higher than other regions. We think this non-uniform read distribution is largely due to sample degradation during the library preparation.

As we don't have a biology backgroud, we are wondering whether this explanation is right or not. Only you know what caused this phenomenon, you can correct this distribution bias correctly.

We are also seeking for biology guys to explain the reasons. Thanks!

**sma** · 08-16-2010, 07:22 AM

Dear WANG Xi,

I am running into troubles using DEGseq. Admittedly, I am a novice to analyzing this kind of data and using R---in fact I only loaded R just yesterday in order to run your DEGseq package.

I am trying to compare three separate sets of 454 data (from 3 stages of development). When we get the data back from our company in Shanghai, it is complete with the annotations and the reads are already counted. There are no replicates (just 3 samples). So, I think I can simply use DEGexp with the MARS method.

I have followed your recent suggestions to "(1) download the most recent version with its “reference manual”, (2) following the examples in the manual, apply DEGseq’s functions to the test data, (3) replace the data set, and apply DEGseq to your own data." I can successfully run the test data without problem. I can also replace the data with my own data and I can successfully show the filepath to my data (so I know I have entered it correctly). However, when I try to set up the geneExpMatrix, I run into problems. I always get an error that says :
"Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 10 did not have 4 elements"

I've tried everything I can think of and I cannot seem to remedy this problem. Any suggestion?

Thank you!

**Xi Wang** · 08-16-2010, 04:49 PM

Originally posted by sma View Post

Dear WANG Xi,

I am running into troubles using DEGseq. Admittedly, I am a novice to analyzing this kind of data and using R---in fact I only loaded R just yesterday in order to run your DEGseq package.

I am trying to compare three separate sets of 454 data (from 3 stages of development). When we get the data back from our company in Shanghai, it is complete with the annotations and the reads are already counted. There are no replicates (just 3 samples). So, I think I can simply use DEGexp with the MARS method.

I have followed your recent suggestions to "(1) download the most recent version with its “reference manual”, (2) following the examples in the manual, apply DEGseq’s functions to the test data, (3) replace the data set, and apply DEGseq to your own data." I can successfully run the test data without problem. I can also replace the data with my own data and I can successfully show the filepath to my data (so I know I have entered it correctly). However, when I try to set up the geneExpMatrix, I run into problems. I always get an error that says :
"Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 10 did not have 4 elements"

I've tried everything I can think of and I cannot seem to remedy this problem. Any suggestion?

Thank you!

Dear Sma,

Thanks for using DEGseq. I am wondering your problem may be caused by the format of the input file to geneExpMatrix. Could you paste the first 15 lines in the input file here, or email to me via [email protected]? Thanks.

**osvaldoreis** · 09-30-2010, 08:11 AM

Starting using Degseq

Hey all, I'm just start using Degseq and I'm having some troubles. I have many library without replicats and I want to see the differential expression in the time course of a disease. I have solexa reads MP ~35 bp og RNA-seq, the genome and a gene prediction of this specie. I align the reads with the predicted genes using bowtie then I made and script to make a file like this:

"Gene" "LB1" "LB2"
"g1" 45 103
... ... ...

where the numbers are how many reads align with that gene for each library. Then I ran DEGexp like this:

library(DEGseq)

geneExpMatrix1 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(2))
geneExpMatrix2 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(3))

layout(matrix(c(1,2), 3, 2, byrow=TRUE))

par(mar=c(2, 2, 2, 2))

DEGexp(geneExpMatrix1=geneExpMatrix1, geneCol1=1, expCol1=2, groupLabel1="GPB1", geneExpMatrix2=geneExpMatrix2, geneCol2=1, expCol2=2, groupLabel2="DPB1", method="MARS", outputDir="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/", pValue=1e-2, thresholdKind=1)

I got results without errors but the results have a lot of differentially expressed genes. I have ~18k genes and ~12k are differentially expressed. Looking at the file I can see that many of them aren't significant. I wanted to know if I'm doing something wrong?

Thanks for any help!

**Xi Wang** · 10-01-2010, 01:00 AM

Originally posted by osvaldoreis View Post

Hey all, I'm just start using Degseq and I'm having some troubles. I have many library without replicats and I want to see the differential expression in the time course of a disease. I have solexa reads MP ~35 bp og RNA-seq, the genome and a gene prediction of this specie. I align the reads with the predicted genes using bowtie then I made and script to make a file like this:

"Gene" "LB1" "LB2"
"g1" 45 103
... ... ...

where the numbers are how many reads align with that gene for each library. Then I ran DEGexp like this:

library(DEGseq)

geneExpMatrix1 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(2))
geneExpMatrix2 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(3))

layout(matrix(c(1,2), 3, 2, byrow=TRUE))

par(mar=c(2, 2, 2, 2))

DEGexp(geneExpMatrix1=geneExpMatrix1, geneCol1=1, expCol1=2, groupLabel1="GPB1", geneExpMatrix2=geneExpMatrix2, geneCol2=1, expCol2=2, groupLabel2="DPB1", method="MARS", outputDir="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/", pValue=1e-2, thresholdKind=1)

I got results without errors but the results have a lot of differentially expressed genes. I have ~18k genes and ~12k are differentially expressed. Looking at the file I can see that many of them aren't significant. I wanted to know if I'm doing something wrong?

Thanks for any help!

Hi,

Thanks for using DEGseq in differential expression analysis.

As your samples are without biological replicates (right?), the statistical model in DEGseq only depicts the measurement uncertainty in RNA-seq technology, so there could be some genes, which are picked up as differentially expressed genes, do appear to be differentially expressed in samples (for several reasons such as individual differences) but don't have biological significance. It is often said that statistical significance doesn't equal to biological significance.
Another thing is you may try more stringent p-value (or q-value) cutoff, say specifying pValue=1e-3. Or you'd better use thresholdKind=3 or thresholdKind=4. The q-values are adjusted from p-values for multiple testing correction.

Hope this helps.

**luoruicd** · 10-08-2010, 12:06 PM

Hi Xi Wang,
Thanks for your software. I am wondering what's the difference between DEseq and DEGseq?

**Xi Wang** · 10-11-2010, 05:30 PM

Originally posted by luoruicd View Post

Hi Xi Wang,
Thanks for your software. I am wondering what's the difference between DEseq and DEGseq?

Please read the respective articles for the two tools.

**Sol** · 10-26-2010, 10:54 AM

Hi
I ran the program DEGexp of the DEGseq. The output file generated a table with values of log2 fold change, z-score, p-value, q-value and signature (p-value <0.001). How to interpret the gene upregulation and downregulation? what each column means? the input file was RPKM and genes. I have not replicate, but I'm comparing two conditions.
Thanks

**Xi Wang** · 10-26-2010, 06:37 PM

Originally posted by Sol View Post

Hi
I ran the program DEGexp of the DEGseq. The output file generated a table with values of log2 fold change, z-score, p-value, q-value and signature (p-value <0.001). How to interpret the gene upregulation and downregulation? what each column means? the input file was RPKM and genes. I have not replicate, but I'm comparing two conditions.
Thanks

Hi Sol, Thanks for using DEGseq.

In the output file, there are 2 columns for fold-change: "log2(Fold_change)" and "log2(Fold_change) normalized". log2(Fold_change) = log(value1/value2), and the normalized value is got from the normalized value1 and value2. From the value of fold-change, you can judge this gene is up-regulated or down-regulated. For example, for a gene if its log2(Fold_change) > 0, which means value1 > value2, and if its signature = TRUE, this gene is significantly down-regulated in condition 2. Also, you can look into z-scores.

Hope this helps.

**Sol** · 10-27-2010, 08:14 AM

but, o z-score, is based on what data? and what is difference between q-value and p-value. I don't understand.
Thanks

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 22 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 61 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News