Seqanswers Leaderboard Ad

**Xi Wang** · 12-06-2010, 08:38 AM

Originally posted by newbietonextgen View Post

No. I have tried both formats: giving the path to the file and then setting up the working dir and then naming the file. I am using a 64 bit R and i am nots sure if it a problem with it.

This is how the console looks:
>library(DEGseq)
Loading required package: qvalue
Loading Tcl/Tk interface
> sample A <- "path to the file (bed.txt)"
|

So there was no screen message after i hit return...

I found that you didn't use the most updated version of DEGseq.
Please download the newest version from :

DEGseq

http://bioconductor.org/packages/release/bioc/html/DEGseq.html

DEGseq is an R package to identify differentially expressed genes from RNA-Seq data.

And second, in R, variables can't have space in them; And you should tell it where is your file, but not the sentence.
E.g.,

Code:

sample_A <- "/home/username/data.bed"

**newbietonextgen** · 12-07-2010, 07:00 AM

Hi Xi,

I finally figured out what the problem was with DEGseq execution. The R installation in mac does not come with the Tcl/Tk libraries. Once i down loaded it, it ran fine, as far loading all the needed libararies.

> library(DEGseq)
Loading required package: qvalue
Loading Tcl/Tk interface ... done
Loading required package: ShortRead
Loading required package: IRanges

Attaching package: 'IRanges'

The following object(s) are masked from 'package:base':

cbind, eval, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int,
rbind, rep.int, table

Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: lattice
Loading required package: Rsamtools
Loading required package: samr
Loading required package: impute

Now i run into another problem. Please read the output below. First the mapresults don't show any path as per the example. But i am not sure if it happens in all operating systems. Further down it shows that it cannot read the input file. I am not sure about it. All i did was take a sorted BAM file and convert it to BED format using BEDtools. Does it need any other input? Any help is appreciated.

Thnaks

Please wait...

mapResultBatch1:

mapResultBatch2:

file format: bed
refFlat:
Ignore the strand information when count the reads mapped to genes!
Count the number of reads mapped to each gene ...
This will take several minutes, please wait patiently!
Please wait...

does not exist!
SampleFiles:
Count the number of reads mapped to each gene.
This will take several minutes.
Please wait ...
cannot open input file
There is something wrong!
Please check !
There is something wrong!Please check...
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '/var/folders/Bl/BlOaI4RVFYyvhEI-W+aTz++++TI/-Tmp-//RtmpuyIAOK/DEGseqExample/group1.exp': No such file or directory

**Xi Wang** · 12-07-2010, 07:10 AM

Originally posted by newbietonextgen View Post

Hi Xi,

I finally figured out what the problem was with DEGseq execution. The R installation in mac does not come with the Tcl/Tk libraries. Once i down loaded it, it ran fine, as far loading all the needed libararies.

> library(DEGseq)
Loading required package: qvalue
Loading Tcl/Tk interface ... done
Loading required package: ShortRead
Loading required package: IRanges

Attaching package: 'IRanges'

The following object(s) are masked from 'package:base':

cbind, eval, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int,
rbind, rep.int, table

Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: lattice
Loading required package: Rsamtools
Loading required package: samr
Loading required package: impute

Now i run into another problem. Please read the output below. First the mapresults don't show any path as per the example. But i am not sure if it happens in all operating systems. Further down it shows that it cannot read the input file. I am not sure about it. All i did was take a sorted BAM file and convert it to BED format using BEDtools. Does it need any other input? Any help is appreciated.

Thnaks

Please wait...

mapResultBatch1:

mapResultBatch2:

file format: bed
refFlat:
Ignore the strand information when count the reads mapped to genes!
Count the number of reads mapped to each gene ...
This will take several minutes, please wait patiently!
Please wait...

does not exist!
SampleFiles:
Count the number of reads mapped to each gene.
This will take several minutes.
Please wait ...
cannot open input file
There is something wrong!
Please check !
There is something wrong!Please check...
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '/var/folders/Bl/BlOaI4RVFYyvhEI-W+aTz++++TI/-Tmp-//RtmpuyIAOK/DEGseqExample/group1.exp': No such file or directory

Hi,

Please show me your R script to run DEGseq. You can email me: [email protected] , if you don't want to put the details here.

Thanks.

**mgolo** · 07-21-2011, 03:36 AM

DEGseq and expression of novel small RNAs

Hi all!

I´m new to the NGS business, and right now i have a lot of doubts about DE analysis.

I have RNA-sequenced a bacterial transcriptome in 2 growth conditions, and I have 3 biological replicates for each condition:

Condition A : Replicate 1A, Replicate 2A, Replicate 3A
Condition B : Replicate 1B, Replicate 2B, Replicate 3B

I have the bam an pileup files for each replicate.

Now, my aim is compare the expression of non-annotated non-coding RNAs in my conditions A and B (so i will use a custom annotation file).

I have read about DEGseq and i would like to use it for my DE analysis. But i have a number of questions about it:

1. What method would suit my analysis best? I have thought of using MARS...

2. How do I normalize my replicates? Should i use loess or median? What´s the difference between them?

3. What is better: to pool the 3 replicates of each condition or to analyze DE without pooling them?

4. Since my transcripts are not annotated i will have to use expression values based on raw read counts, right? Can i use the rawCount argument with the DEGseq function or is it only valid with the DEGexp function? If i use the MARS method is it automatically set to analyze raw counts?

Thanks in advance for your help!

Maria

**Xi Wang** · 07-21-2011, 05:50 PM

Originally posted by mgolo View Post

Hi all!

I´m new to the NGS business, and right now i have a lot of doubts about DE analysis.

I have RNA-sequenced a bacterial transcriptome in 2 growth conditions, and I have 3 biological replicates for each condition:

Condition A : Replicate 1A, Replicate 2A, Replicate 3A
Condition B : Replicate 1B, Replicate 2B, Replicate 3B

I have the bam an pileup files for each replicate.

Now, my aim is compare the expression of non-annotated non-coding RNAs in my conditions A and B (so i will use a custom annotation file).

I have read about DEGseq and i would like to use it for my DE analysis. But i have a number of questions about it:

1. What method would suit my analysis best? I have thought of using MARS...

2. How do I normalize my replicates? Should i use loess or median? What´s the difference between them?

3. What is better: to pool the 3 replicates of each condition or to analyze DE without pooling them?

4. Since my transcripts are not annotated i will have to use expression values based on raw read counts, right? Can i use the rawCount argument with the DEGseq function or is it only valid with the DEGexp function? If i use the MARS method is it automatically set to analyze raw counts?

Thanks in advance for your help!

Maria

Hi Maria

1&2. The methods for DEG detection and the normalization beforehand should depend on how your data distributed. You may try all of them and choose the best one.

3. For biological replicates, it's better not to pool them together.

4. Raw read counts have nothing to do with gene annotation. In our documents, the opposite of 'raw read counts' is RPKM vaules. For the unannotated non-RNAs, you'd better analyze the gene structure first and then the DEGs.

Btw, we are working a new version of DEGseq, which will be more suitable for biological replicates.

**mgolo** · 07-26-2011, 05:57 AM

Originally posted by Xi Wang View Post

Hi Maria

1&2. The methods for DEG detection and the normalization beforehand should depend on how your data distributed. You may try all of them and choose the best one.

3. For biological replicates, it's better not to pool them together.

4. Raw read counts have nothing to do with gene annotation. In our documents, the opposite of 'raw read counts' is RPKM vaules. For the unannotated non-RNAs, you'd better analyze the gene structure first and then the DEGs.

Btw, we are working a new version of DEGseq, which will be more suitable for biological replicates.

Thanks for your reply Xi

I'll try all the methods when i have my annotation file. But, what are the criteria to know which one is the best?

Looking forward to your new version of DEGseq!

**Xi Wang** · 07-26-2011, 06:58 AM

Originally posted by mgolo View Post

Thanks for your reply Xi

I'll try all the methods when i have my annotation file. But, what are the criteria to know which one is the best?

Looking forward to your new version of DEGseq!

I think one of the most important criteria should be how the DEGs detected consist with previous knowledge, though the new findings may give novel discoveries. From the statistical point of view, the best method should guarantee that your data don't violate the assumption of the chosen method.

**townway** · 08-03-2011, 09:32 PM

Hi Xi,
My data is time course data with 6 time points but without replicate. I wonder if I can try your DEGseq.

If not, would you suggest some alternatively ways?

Thank you in advance!

Townway

**Xi Wang** · 08-04-2011, 10:12 PM

Originally posted by townway View Post

Hi Xi,
My data is time course data with 6 time points but without replicate. I wonder if I can try your DEGseq.

If not, would you suggest some alternatively ways?

Thank you in advance!

Townway

Sorry Townway, DEGseq is now not suitable for time series data. Please try Cufflinks (http://cufflinks.cbcb.umd.edu/) instead. Thanks.

**wangleibio** · 09-22-2011, 09:18 PM

DEGdseq problem

hi,xi
I have a problem using DEGseq,
DEGexp(geneExpMatrix1 = geneExpMatrix1, geneCol1 = 1,expCol1 = 2, groupLabel1 = "roottip",geneExpMatrix2 = geneExpMatrix2,geneCol2 = 1,expCol2 = 2,groupLabel2 = "hypocotyl",outputDir= "./roothypocoty",method = "MARS")

Please wait...
gene id column in geneExpMatrix1 for sample1: 1
expression value column(s) in geneExpMatrix1: 2
total number of reads uniquely mapped to genome obtained from sample1: 62747041
gene id column in geneExpMatrix2 for sample2: 1
expression value column(s) in geneExpMatrix2: 2
total number of reads uniquely mapped to genome obtained from sample2: 69469907

method to identify differentially expressed genes: MARS
pValue threshold: 0.001
output directory: ./roothypocoty

Please wait ...
Identifying differentially expressed genes ...
Please wait patiently ...
output ...

Done ...
The results can be observed in directory: ./roothypocoty

problem:

it can produce the file(outpuDir),but do not produce MA-plot,
additionaly, my two sample data do not have replicates.

hope you help !
thanks !
lei

**Xi Wang** · 09-22-2011, 09:42 PM

Originally posted by wangleibio View Post

hi,xi
I have a problem using DEGseq,
DEGexp(geneExpMatrix1 = geneExpMatrix1, geneCol1 = 1,expCol1 = 2, groupLabel1 = "roottip",geneExpMatrix2 = geneExpMatrix2,geneCol2 = 1,expCol2 = 2,groupLabel2 = "hypocotyl",outputDir= "./roothypocoty",method = "MARS")

Please wait...
gene id column in geneExpMatrix1 for sample1: 1
expression value column(s) in geneExpMatrix1: 2
total number of reads uniquely mapped to genome obtained from sample1: 62747041
gene id column in geneExpMatrix2 for sample2: 1
expression value column(s) in geneExpMatrix2: 2
total number of reads uniquely mapped to genome obtained from sample2: 69469907

method to identify differentially expressed genes: MARS
pValue threshold: 0.001
output directory: ./roothypocoty

Please wait ...
Identifying differentially expressed genes ...
Please wait patiently ...
output ...

Done ...
The results can be observed in directory: ./roothypocoty

problem:

it can produce the file(outpuDir),but do not produce MA-plot,
additionaly, my two sample data do not have replicates.

hope you help !
thanks !
lei

Thanks for using DEGseq.

To figure out your problem, please try
(1) Run the example provide in the help document. Simply type "?DEGexp" in the R console, and cope/paste the Examples at the end of the document. Then check if the example works properly
(2) Run "sessionInfo()" in R console, and paste the result here or better email to me "[email protected]"

Thanks.

**AsoBioInfo** · 03-25-2012, 10:45 PM

DEGseq Question

Hello,

I have a question regarding DEGseq. I am not understanding the syntax of layout:
layout(matrix(c(1, 2, 3, 4, 5, 6), 3, 2, byrow = TRUE))

I am seeing my graphs but it is not interpreting anything. For my data only three rows were considered and their log fold changes were calculated. But for the remaining data, no histogram was built.

The first chunk of data is able to read the whole data, I think something is wrong in only fixing the layout and matrix.

Thanks for your help!
Aso

**Xi Wang** · 03-26-2012, 04:13 AM

Originally posted by AsoBioInfo View Post

Hello,

I have a question regarding DEGseq. I am not understanding the syntax of layout:
layout(matrix(c(1, 2, 3, 4, 5, 6), 3, 2, byrow = TRUE))

I am seeing my graphs but it is not interpreting anything. For my data only three rows were considered and their log fold changes were calculated. But for the remaining data, no histogram was built.

The first chunk of data is able to read the whole data, I think something is wrong in only fixing the layout and matrix.

Thanks for your help!
Aso

Dear Aso, thanks for your questions.

The "layout" is only related to drawing the DEGSeq output plot. Specifically, the command line means to generate a figure with 6 panels in 3 rows and 2 columns.

For your problem, could you copy and paste a head of your data and your command lines here? Thus I will be able to diagnose the issues. Thanks.

**ETHANol** · 03-26-2012, 04:38 AM

Originally posted by AsoBioInfo View Post

Hello,

I have a question regarding DEGseq. I am not understanding the syntax of layout:
layout(matrix(c(1, 2, 3, 4, 5, 6), 3, 2, byrow = TRUE))

I am seeing my graphs but it is not interpreting anything. For my data only three rows were considered and their log fold changes were calculated. But for the remaining data, no histogram was built.

The first chunk of data is able to read the whole data, I think something is wrong in only fixing the layout and matrix.

Thanks for your help!
Aso

Are you analyzing RNA-seq data? If so the overwhelming opinion of the community is that the poisson model of DEGseq is invalid and you should use edgeR or DESeq instead.

**AsoBioInfo** · 03-26-2012, 05:38 AM

Originally posted by Xi Wang View Post

Dear Aso, thanks for your questions.

The "layout" is only related to drawing the DEGSeq output plot. Specifically, the command line means to generate a figure with 6 panels in 3 rows and 2 columns.

For your problem, could you copy and paste a head of your data and your command lines here? Thus I will be able to diagnose the issues. Thanks.

Thanks Xi for your reply!

The output score data looks like this:
"GeneNames" "value1" "value2" "log2(Fold_change)"
00000000000000 6 10 -0.736 -0.643
11111111111111 68 69 -0.02 0.072
22222222222222 1 1 0 0.095
33333333333333 NA NA NA NA NA NA NA NA FALSE
44444444444444 NA NA NA NA NA NA NA NA FALSE

Note: There are other scores also.

The fold change is calculated for only three rows. Although the matrix is having all values since it is giving output the whole matrix. The commands I used are:

-> library(DEGseq)
geneExpFile <- "D:/data/MyData.txt"
geneExpMatrix1 <- readGeneExp(file=geneExpFile, geneCol=1, valCol=c(7,9,11))
geneExpMatrix2 <- readGeneExp(file=geneExpFile, geneCol=1, valCol=c(8,10,12))
write.table(geneExpMatrix1[1:13,],row.names=FALSE)
write.table(geneExpMatrix2[1:13,],row.names=FALSE)

-> layout(matrix(c(1,2,3,4,5,6), 3, 2, byrow=TRUE))
par(mar=c(2, 2, 2, 2))
DEGexp(geneExpMatrix1=geneExpMatrix1, geneCol1=1, expCol1=c(2,3,4,5,6), groupLabel1="Label1",
geneExpMatrix2=geneExpMatrix2, geneCol2=1, expCol2=c(2,3,4,5,6), groupLabel2="Label2",
method="MARS")

Hope this helps!

Thanks!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News