Or remove "chr" them from your alignment file, whatever is easier. But the names should match!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by areyes View PostCheck the function read.HTSeqCounts from DEXSeq. This will read this files, parse relevant information for the package and return an ExonCountSet object in your R session.
Comment
-
Comment
-
Originally posted by areyes View Post
Comment
-
Hi there,
we want to do some alternative splicing analysis with DEXSeq. I'm at the data preparation step. We have got quite big files (80 GB per replicate after converting the 15 GB .bam file). How many GB of RAM are needed in order to sort them (--mem-per-cpu=). Is it 80 gigs, 160 (hopefully not...) or is it done subsequently with small RAM allocations?
Code:sort -k1,1 -k2,2n ctrl_1.sam > ctrl_1_sorted.sam
Thanks!
Ben
Answer (CSC support)
The issue has been related to the SLURM scripts. It turned out that increasing disk space allocated to the node via
Code:#SBATCH --tmp=200000
Code:srun sort --temporary-directory=/local/$SLURM_JOB_ID -k1,1 -k2,2n ctrl_1.sam > ctrl_1_sorted.sam
Comment
-
I've got a question concering DEXSeq runs on a cluster/saving the calculation output by the node. Is it sufficient to have
Code:save(ecs, file="ecs_ctrlvskd_.RData")
Thanks!
Ben
The R script up to now (I'm new to R, so it's not fine tuned):
Code:library(multicore) library(DEXSeq) setwd("/scratch/therapy/bkoch/dexseq/input") annotationfile = file.path("/scratch/therapy/bkoch/dexseq/GRCh37_Ensembl_DEXSeq.gff") samples = data.frame(condition = c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),replicate=c(1:3,1:3),row.names=c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),stringsAsFactors=TRUE,check.names = FALSE) fullFilenames<- list.files("/scratch/therapy/bkoch/dexseq/input/",full.names=TRUE,pattern="sorted.txt") ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile) ecs<- estimateSizeFactors(ecs) ecs<- estimateDispersions(ecs) ecs<- fitDispersionFunction(ecs) ecs<- estimatelog2FoldChanges(ecs) test<- testForDEU(ecs) res1<- DEUresultTable(test) save(ecs, file="ecs_ctrlvskd_.RData")
Comment
-
Hi kben,
Saving the "ecs" object like that should work, "res1" its easy to get from the ecs object in a slow machine.
Are you having problems with completion issues? If you have multiple cores in your machine, you could use the nCores argument in the functions estimateDispersions and testForDEU to parallelize DEXSeq into many cores. Also discarding genes with lots of exons (e.g. more than 150) will help, unfortunately a single gene with many exons can take some time.
Alejandro
Comment
-
-
I'm apperently running into trouble with DEXSeq & library(parallel) on the cluster. Could you please give me hint on how to fix it?
Thanks a lot!
Ben
Error log:
...Fehler in function (classes, fdef, mtable) :
unable to find an inherited method for function "fData", for signature "character"
Calls: estimateDispersions ... .local -> divideWork -> rownames -> fData -> <Anonymous>
Zusätzlich: Warnmeldung:
In parallel::mclapply(allecs, FUN = funtoapply, mc.cores = mc.cores) :
all scheduled cores encountered errors in user code
Ausführung angehalten
srun: error: node1-012: task 0: Exited with exit code 1
Code:library(DEXSeq) setwd("/scratch/therapy/bkoch/dexseq/input") annotationfile = file.path("/scratch/therapy/bkoch/dexseq/GRCh37_Ensembl_DEXSeq.gff") samples = data.frame(condition = c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),replicate=factor(c(1:3,1:3)),row.names=c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),stringsAsFactors=TRUE,check.names = FALSE) fullFilenames<- list.files("/scratch/therapy/bkoch/dexseq/input/",full.names=TRUE,pattern="sorted.txt") ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile) ecs<- estimateSizeFactors(ecs) library(parallel) ecs<- estimateDispersions(ecs, nCores=24) ecs<- fitDispersionFunction(ecs) ecs<- estimatelog2FoldChanges(ecs) test<- testForDEU(ecs, nCores=24) res1<- DEUresultTable(test) save(ecs, file="/scratch/therapy/bkoch/dexseq/output/ecs_ctrlvskd_.RData")
R version 2.15.1 (2012-06-22)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] DEXSeq_1.4.0 Biobase_2.18.0 BiocGenerics_0.4.0
SLURM job script
Code:#!/bin/bash #SBATCH --ntasks=1 #SBATCH --cpus-per-task=24 #SBATCH --error=ecs_err.log #SBATCH --output=ecs_out.log #SBATCH --job-name=ecs #SBATCH --mem-per-cpu=2000 #SBATCH --partition=parallel #SBATCH --time=7-00:00:00 # export OMP_NUM_THREADS=24 srun R --save --file=/scratch/therapy/bkoch/dexseq/dexseq_cluster.R --output=/scratch/therapy/dexseq/output
Comment
-
The cores error (if I didn't mix sth up, and it's a real error) could be related to this one:
update:
nCores=24 => failed run after 9 min
nCores=12 => failed after 10 min
nCores=4 => failed after 7 min
without library(parallel) and the nCores-variable: now running since > 100 min.
Comment
-
Probably my R script isn't correct. The run with only one core failed too after 120 min
Error log:
Code:Fehler in FUN(c("ENSG00000000003", "ENSG00000000419", "ENSG00000000457", : Underdetermined model; cannot estimate dispersions. Maybe replicates have not been properly specified. Calls: estimateDispersions -> estimateDispersions -> .local -> lapply -> FUN Zusätzlich: Warnmeldungen: 1: In .local(object, ...) : Exons with less than 10 counts will not be tested. For more details please see the manual page of 'estimateDispersions', parameter 'minCount' 2: In .local(object, ...) : Genes with more than 70 testable exons will be omitted from the analysis. For more details please see the manual page of 'estimateDispersions', parameter 'maxExon'. Ausführung angehalten srun: error: node2-041: task 0: Exited with exit code 1
Code:samples = data.frame(condition = c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),replicate=factor(c(1:3,1:3)),row.names=c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),stringsAsFactors=TRUE,check.names = FALSE) fullFilenames<- list.files("/scratch/therapy/bkoch/dexseq/input/",full.names=TRUE,pattern="sorted.txt") ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile)
BenLast edited by kben; 10-19-2012, 12:59 PM.
Comment
-
After changing the condition specifications the nCores=24 R script finished within 3 hrs. So the the problems resulted from condition namings.
Successfull data.frame creation:
Code:> samples = data.frame(condition = c("ctrl","ctrl","ctrl","kd","kd","kd"),replicate=factor(c(1:3,1:3)),row.names=c("ctrl_1","ctrl_2","ctrl_3","kd_1","kd_2","kd_3"),stringsAsFactors=TRUE,check.names = FALSE)
Comment
-
Parallel
Originally posted by Thomas Ryan View PostI am new to DEXSeq. I have a set of separate TXT. files for each sample in my R folder. How do I compile these into the data frame samples in R in order to create an Exon count st.
Any suggestions as to how I should get around this. estimate Dispersions with a single core would take quite a while.
Tom
Comment
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:35 AM
|
0 responses
3 views
0 likes
|
Last Post
by seqadmin
Today, 07:35 AM
|
||
Started by seqadmin, Yesterday, 02:06 PM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
Yesterday, 02:06 PM
|
||
Started by seqadmin, 05-14-2024, 07:03 AM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
05-14-2024, 07:03 AM
|
||
Started by seqadmin, 05-10-2024, 06:35 AM
|
0 responses
47 views
0 likes
|
Last Post
by seqadmin
05-10-2024, 06:35 AM
|
Comment