Seqanswers Leaderboard Ad

**areyes** · 10-10-2012, 04:07 AM

Or remove "chr" them from your alignment file, whatever is easier. But the names should match!

**Thomas Ryan** · 10-14-2012, 04:09 AM

Originally posted by areyes View Post

Check the function read.HTSeqCounts from DEXSeq. This will read this files, parse relevant information for the package and return an ExonCountSet object in your R session.

I am new to DEXSeq. I have a set of separate TXT. files for each sample in my R folder. How do I compile these into the data frame samples in R in order to create an Exon count st.

**areyes** · 10-15-2012, 12:49 AM

function read.table

Check also this:

http://egret.psychol.cam.ac.uk/statistics/R/enteringdata.html

**Thomas Ryan** · 10-15-2012, 01:24 AM

Originally posted by areyes View Post

function read.table

Check also this:

http://egret.psychol.cam.ac.uk/stati...eringdata.html

I have used deserve, and read.csv function successfully. However now I have several tables of data. Do I merge them in R.

**areyes** · 10-16-2012, 06:59 AM

?cbind

Alejandro

**kben** · 10-17-2012, 02:15 AM

Hi there,

we want to do some alternative splicing analysis with DEXSeq. I'm at the data preparation step. We have got quite big files (80 GB per replicate after converting the 15 GB .bam file). How many GB of RAM are needed in order to sort them (--mem-per-cpu=). Is it 80 gigs, 160 (hopefully not...) or is it done subsequently with small RAM allocations?

Code:

sort -k1,1 -k2,2n ctrl_1.sam > ctrl_1_sorted.sam

the hardware configuration is: http://csc.uni-frankfurt.de/index.php?id=60&L=2

Thanks!
Ben

Answer (CSC support)

The issue has been related to the SLURM scripts. It turned out that increasing disk space allocated to the node via

Code:

#SBATCH --tmp=200000

followed by

Code:

srun sort --temporary-directory=/local/$SLURM_JOB_ID -k1,1 -k2,2n ctrl_1.sam > ctrl_1_sorted.sam

did solve the problem.

**kben** · 10-19-2012, 03:02 AM

I've got a question concering DEXSeq runs on a cluster/saving the calculation output by the node. Is it sufficient to have

Code:

save(ecs, file="ecs_ctrlvskd_.RData")

at the end of the invoked R script, in order to work with the .RData file on a slower machine subsequently?

Thanks!

Ben

The R script up to now (I'm new to R, so it's not fine tuned):

Code:

library(multicore)
library(DEXSeq)
setwd("/scratch/therapy/bkoch/dexseq/input")
annotationfile = file.path("/scratch/therapy/bkoch/dexseq/GRCh37_Ensembl_DEXSeq.gff")
samples = data.frame(condition = c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),replicate=c(1:3,1:3),row.names=c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),stringsAsFactors=TRUE,check.names = FALSE)
fullFilenames<- list.files("/scratch/therapy/bkoch/dexseq/input/",full.names=TRUE,pattern="sorted.txt")
ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile)
ecs<- estimateSizeFactors(ecs)
ecs<- estimateDispersions(ecs)
ecs<- fitDispersionFunction(ecs)
ecs<- estimatelog2FoldChanges(ecs)
test<- testForDEU(ecs)
res1<- DEUresultTable(test) 
save(ecs, file="ecs_ctrlvskd_.RData")

**kben** · 10-19-2012, 03:13 AM

Likely

Code:

save(test, file="test_ctrlvskd_.RData")
save(res1, file="res1_ctrlvskd_.RData")

have to be added?

**areyes** · 10-19-2012, 03:37 AM

Hi kben,

Saving the "ecs" object like that should work, "res1" its easy to get from the ecs object in a slow machine.

Are you having problems with completion issues? If you have multiple cores in your machine, you could use the nCores argument in the functions estimateDispersions and testForDEU to parallelize DEXSeq into many cores. Also discarding genes with lots of exons (e.g. more than 150) will help, unfortunately a single gene with many exons can take some time.

Alejandro

**kben** · 10-19-2012, 03:58 AM

Hi Alejandro,

awesome, thanks! I will use these lines, and load library(parallel) instead of multicore (checked your nice vignette)!

Code:

ecs <- estimateDispersions(ecs, nCores=24)
ecs <- testForDEU(ecs, nCores=24)

Ben

**kben** · 10-19-2012, 10:40 AM

I'm apperently running into trouble with DEXSeq & library(parallel) on the cluster. Could you please give me hint on how to fix it?

Thanks a lot!
Ben

Error log:

...Fehler in function (classes, fdef, mtable) :
unable to find an inherited method for function "fData", for signature "character"
Calls: estimateDispersions ... .local -> divideWork -> rownames -> fData -> <Anonymous>
Zusätzlich: Warnmeldung:
In parallel::mclapply(allecs, FUN = funtoapply, mc.cores = mc.cores) :
all scheduled cores encountered errors in user code
Ausführung angehalten
srun: error: node1-012: task 0: Exited with exit code 1

The R script:

Code:

library(DEXSeq)
setwd("/scratch/therapy/bkoch/dexseq/input")
annotationfile = file.path("/scratch/therapy/bkoch/dexseq/GRCh37_Ensembl_DEXSeq.gff")
samples = data.frame(condition = c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),replicate=factor(c(1:3,1:3)),row.names=c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),stringsAsFactors=TRUE,check.names = FALSE)
fullFilenames<- list.files("/scratch/therapy/bkoch/dexseq/input/",full.names=TRUE,pattern="sorted.txt")
ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile)
ecs<- estimateSizeFactors(ecs)
library(parallel)
ecs<- estimateDispersions(ecs, nCores=24)
ecs<- fitDispersionFunction(ecs)
ecs<- estimatelog2FoldChanges(ecs)
test<- testForDEU(ecs, nCores=24)
res1<- DEUresultTable(test) 
save(ecs, file="/scratch/therapy/bkoch/dexseq/output/ecs_ctrlvskd_.RData")

SessionInfo (if I start R after my login on the cluster):

R version 2.15.1 (2012-06-22)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] DEXSeq_1.4.0 Biobase_2.18.0 BiocGenerics_0.4.0

SLURM job script

Code:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=24
#SBATCH --error=ecs_err.log
#SBATCH --output=ecs_out.log
#SBATCH --job-name=ecs
#SBATCH --mem-per-cpu=2000
#SBATCH --partition=parallel
#SBATCH --time=7-00:00:00
#
export OMP_NUM_THREADS=24
srun R --save --file=/scratch/therapy/bkoch/dexseq/dexseq_cluster.R --output=/scratch/therapy/dexseq/output

**kben** · 10-19-2012, 10:57 AM

The cores error (if I didn't mix sth up, and it's a real error) could be related to this one:

[BioC] multicore and DEXSeq

https://stat.ethz.ch/pipermail/bioconductor/2012-July/047238.html

update:

nCores=24 => failed run after 9 min
nCores=12 => failed after 10 min
nCores=4 => failed after 7 min

without library(parallel) and the nCores-variable: now running since > 100 min.

**kben** · 10-19-2012, 12:52 PM

Probably my R script isn't correct. The run with only one core failed too after 120 min

Error log:

Code:

Fehler in FUN(c("ENSG00000000003", "ENSG00000000419", "ENSG00000000457",  :
  Underdetermined model; cannot estimate dispersions. Maybe replicates have not been properly specified.
Calls: estimateDispersions -> estimateDispersions -> .local -> lapply -> FUN
Zusätzlich: Warnmeldungen:
1: In .local(object, ...) :
  Exons with less than 10 counts will not be tested. For more details please see the manual page of 'estimateDispersions', parameter 'minCount'
2: In .local(object, ...) :
  Genes with more than 70 testable exons will be omitted from the analysis. For more details please see the manual page of 'estimateDispersions', parameter 'maxExon'.
Ausführung angehalten
srun: error: node2-041: task 0: Exited with exit code 1

Regarding the error log's replicate (we have 3x biological replicates control, 3x biological replicates knockdown with each > 100 million clusters) statement: they are specified in this way (is it OK)?

Code:

samples = data.frame(condition = c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),replicate=factor(c(1:3,1:3)),row.names=c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),stringsAsFactors=TRUE,check.names = FALSE)
fullFilenames<- list.files("/scratch/therapy/bkoch/dexseq/input/",full.names=TRUE,pattern="sorted.txt")
ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile)

Thanks!
Ben

**kben** · 10-21-2012, 05:59 AM

After changing the condition specifications the nCores=24 R script finished within 3 hrs. So the the problems resulted from condition namings.

Successfull data.frame creation:

Code:

> samples = data.frame(condition = c("ctrl","ctrl","ctrl","kd","kd","kd"),replicate=factor(c(1:3,1:3)),row.names=c("ctrl_1","ctrl_2","ctrl_3","kd_1","kd_2","kd_3"),stringsAsFactors=TRUE,check.names = FALSE)

**Thomas Ryan** · 10-21-2012, 06:54 AM

Parallel

Originally posted by Thomas Ryan View Post

I am new to DEXSeq. I have a set of separate TXT. files for each sample in my R folder. How do I compile these into the data frame samples in R in order to create an Exon count st.

I have encountered a problem with estimateDispersions using library(parallel), with 8 cores, after 14000 genes I get a message "In mccollect(children(jobs), FALSE): restarting interrupted promise evaluation".
Any suggestions as to how I should get around this. estimate Dispersions with a single core would take quite a while.
Tom

Topics	Statistics	Last Post
Ancient Viral Sequences in Human Brain Linked to Psychiatric Disorders by seqadmin Started by seqadmin, Today, 07:35 AM	0 responses 3 views 0 likes	Last Post by seqadmin Today, 07:35 AM
New Milestone for COSMIC with Extensive Cancer Mutation Data by seqadmin Started by seqadmin, Yesterday, 02:06 PM	0 responses 8 views 0 likes	Last Post by seqadmin Yesterday, 02:06 PM
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, 05-14-2024, 07:03 AM	0 responses 27 views 0 likes	Last Post by seqadmin 05-14-2024, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 47 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News