Or remove "chr" them from your alignment file, whatever is easier. But the names should match!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by areyes View PostCheck the function read.HTSeqCounts from DEXSeq. This will read this files, parse relevant information for the package and return an ExonCountSet object in your R session.
Comment
-
Comment
-
Originally posted by areyes View Post
Comment
-
Hi there,
we want to do some alternative splicing analysis with DEXSeq. I'm at the data preparation step. We have got quite big files (80 GB per replicate after converting the 15 GB .bam file). How many GB of RAM are needed in order to sort them (--mem-per-cpu=). Is it 80 gigs, 160 (hopefully not...) or is it done subsequently with small RAM allocations?
Code:sort -k1,1 -k2,2n ctrl_1.sam > ctrl_1_sorted.sam
Thanks!
Ben
Answer (CSC support)
The issue has been related to the SLURM scripts. It turned out that increasing disk space allocated to the node via
Code:#SBATCH --tmp=200000
Code:srun sort --temporary-directory=/local/$SLURM_JOB_ID -k1,1 -k2,2n ctrl_1.sam > ctrl_1_sorted.sam
Comment
-
I've got a question concering DEXSeq runs on a cluster/saving the calculation output by the node. Is it sufficient to have
Code:save(ecs, file="ecs_ctrlvskd_.RData")
Thanks!
Ben
The R script up to now (I'm new to R, so it's not fine tuned):
Code:library(multicore) library(DEXSeq) setwd("/scratch/therapy/bkoch/dexseq/input") annotationfile = file.path("/scratch/therapy/bkoch/dexseq/GRCh37_Ensembl_DEXSeq.gff") samples = data.frame(condition = c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),replicate=c(1:3,1:3),row.names=c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),stringsAsFactors=TRUE,check.names = FALSE) fullFilenames<- list.files("/scratch/therapy/bkoch/dexseq/input/",full.names=TRUE,pattern="sorted.txt") ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile) ecs<- estimateSizeFactors(ecs) ecs<- estimateDispersions(ecs) ecs<- fitDispersionFunction(ecs) ecs<- estimatelog2FoldChanges(ecs) test<- testForDEU(ecs) res1<- DEUresultTable(test) save(ecs, file="ecs_ctrlvskd_.RData")
Comment
-
Hi kben,
Saving the "ecs" object like that should work, "res1" its easy to get from the ecs object in a slow machine.
Are you having problems with completion issues? If you have multiple cores in your machine, you could use the nCores argument in the functions estimateDispersions and testForDEU to parallelize DEXSeq into many cores. Also discarding genes with lots of exons (e.g. more than 150) will help, unfortunately a single gene with many exons can take some time.
Alejandro
Comment
-
-
I'm apperently running into trouble with DEXSeq & library(parallel) on the cluster. Could you please give me hint on how to fix it?
Thanks a lot!
Ben
Error log:
...Fehler in function (classes, fdef, mtable) :
unable to find an inherited method for function "fData", for signature "character"
Calls: estimateDispersions ... .local -> divideWork -> rownames -> fData -> <Anonymous>
Zusätzlich: Warnmeldung:
In parallel::mclapply(allecs, FUN = funtoapply, mc.cores = mc.cores) :
all scheduled cores encountered errors in user code
Ausführung angehalten
srun: error: node1-012: task 0: Exited with exit code 1
Code:library(DEXSeq) setwd("/scratch/therapy/bkoch/dexseq/input") annotationfile = file.path("/scratch/therapy/bkoch/dexseq/GRCh37_Ensembl_DEXSeq.gff") samples = data.frame(condition = c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),replicate=factor(c(1:3,1:3)),row.names=c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),stringsAsFactors=TRUE,check.names = FALSE) fullFilenames<- list.files("/scratch/therapy/bkoch/dexseq/input/",full.names=TRUE,pattern="sorted.txt") ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile) ecs<- estimateSizeFactors(ecs) library(parallel) ecs<- estimateDispersions(ecs, nCores=24) ecs<- fitDispersionFunction(ecs) ecs<- estimatelog2FoldChanges(ecs) test<- testForDEU(ecs, nCores=24) res1<- DEUresultTable(test) save(ecs, file="/scratch/therapy/bkoch/dexseq/output/ecs_ctrlvskd_.RData")
R version 2.15.1 (2012-06-22)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] DEXSeq_1.4.0 Biobase_2.18.0 BiocGenerics_0.4.0
SLURM job script
Code:#!/bin/bash #SBATCH --ntasks=1 #SBATCH --cpus-per-task=24 #SBATCH --error=ecs_err.log #SBATCH --output=ecs_out.log #SBATCH --job-name=ecs #SBATCH --mem-per-cpu=2000 #SBATCH --partition=parallel #SBATCH --time=7-00:00:00 # export OMP_NUM_THREADS=24 srun R --save --file=/scratch/therapy/bkoch/dexseq/dexseq_cluster.R --output=/scratch/therapy/dexseq/output
Comment
-
The cores error (if I didn't mix sth up, and it's a real error) could be related to this one:
update:
nCores=24 => failed run after 9 min
nCores=12 => failed after 10 min
nCores=4 => failed after 7 min
without library(parallel) and the nCores-variable: now running since > 100 min.
Comment
-
Probably my R script isn't correct. The run with only one core failed too after 120 min
Error log:
Code:Fehler in FUN(c("ENSG00000000003", "ENSG00000000419", "ENSG00000000457", : Underdetermined model; cannot estimate dispersions. Maybe replicates have not been properly specified. Calls: estimateDispersions -> estimateDispersions -> .local -> lapply -> FUN Zusätzlich: Warnmeldungen: 1: In .local(object, ...) : Exons with less than 10 counts will not be tested. For more details please see the manual page of 'estimateDispersions', parameter 'minCount' 2: In .local(object, ...) : Genes with more than 70 testable exons will be omitted from the analysis. For more details please see the manual page of 'estimateDispersions', parameter 'maxExon'. Ausführung angehalten srun: error: node2-041: task 0: Exited with exit code 1
Code:samples = data.frame(condition = c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),replicate=factor(c(1:3,1:3)),row.names=c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),stringsAsFactors=TRUE,check.names = FALSE) fullFilenames<- list.files("/scratch/therapy/bkoch/dexseq/input/",full.names=TRUE,pattern="sorted.txt") ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile)
BenLast edited by kben; 10-19-2012, 12:59 PM.
Comment
-
After changing the condition specifications the nCores=24 R script finished within 3 hrs. So the the problems resulted from condition namings.
Successfull data.frame creation:
Code:> samples = data.frame(condition = c("ctrl","ctrl","ctrl","kd","kd","kd"),replicate=factor(c(1:3,1:3)),row.names=c("ctrl_1","ctrl_2","ctrl_3","kd_1","kd_2","kd_3"),stringsAsFactors=TRUE,check.names = FALSE)
Comment
-
Parallel
Originally posted by Thomas Ryan View PostI am new to DEXSeq. I have a set of separate TXT. files for each sample in my R folder. How do I compile these into the data frame samples in R in order to create an Exon count st.
Any suggestions as to how I should get around this. estimate Dispersions with a single core would take quite a while.
Tom
Comment
Latest Articles
Collapse
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 06-03-2024, 06:55 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
06-03-2024, 06:55 AM
|
||
Started by seqadmin, 05-30-2024, 03:16 PM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
05-30-2024, 03:16 PM
|
||
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability
by seqadmin
Started by seqadmin, 05-29-2024, 01:32 PM
|
0 responses
29 views
0 likes
|
Last Post
by seqadmin
05-29-2024, 01:32 PM
|
||
Started by seqadmin, 05-24-2024, 07:15 AM
|
0 responses
216 views
0 likes
|
Last Post
by seqadmin
05-24-2024, 07:15 AM
|
Comment