Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
figured it out - the rod file index was corrupted. Downloaded new verison of GATK along with resource bundle: http://www.broadinstitute.org/gsa/wi...esource_bundle with dbSNP vcf and it works much better.
-
GATK CountCovariates running very slow
Hi,
I tried posting this question on GetStatisfaction GATK forum but kept getting an invalid request error in Firefox. I thought I would give SeqAnswers a try (this is my first post here)
I am trying to recalibrate quality scores with GATK CountCovariates and it is running extremely slow:
java -Xmx64000m -jar GenomeAnalysisTK.jar -R $REF_BIN/$REF --DBSNP
$DBSNP_BIN/$DBSNP -l INFO -T CountCovariates -I my.bam
--max_reads_at_locus 20000 -cov ReadGroupCovariate -cov
QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate
-recalFile $CSV > $CSV.stdout 2> $NODE_DIR/$OUTPUT.stderr
Initially GATK gives an EOF exception for reading a *.rod.idx file
INFO 08:51:15,032 TribbleRMDTrackBuilder - Loading Tribble index from
disk for file /scratch/indapa/dbsnp_129_b37.rod
ERROR 08:51:19,710 LinearIndex - Error reading index file:
/scratch/indapa/dbsnp_129_b37.rod.idx
java.io.EOFException
But then proceeds to the CovariateCounterWalker and starts recording
the number sites traversed (the bam file I want to recalibrate has ~150M reads and is 11GB in size)
INFO 08:59:30,757 CovariateCounterWalker - The covariates being used here:
INFO 08:59:30,758 CovariateCounterWalker - ReadGroupCovariate
INFO 08:59:30,758 CovariateCounterWalker - QualityScoreCovariate
INFO 08:59:30,758 CovariateCounterWalker - CycleCovariate
INFO 08:59:30,759 CovariateCounterWalker - DinucCovariate
INFO 09:00:25,452 TraversalEngine - [PROGRESS] Traversed to 1:10001,
processing 1 sites in 545.65 secs (545645000.00 secs per 1M sites)
It has been traversing human chromosome 1 for >2days. I was initially
getting out of memory exception and I allocated much more memory to
the java heap than I had done in the past. I'm not sure why this is taking so much longer than previous bam files I've recalibrated with GATK of similar file size. Has anyone experienced similar behavior with CounCovariates?
Latest Articles
Collapse
-
by seqadmin
Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.
Nucleic Acid Quality Control
Preparing for NGS starts with isolating the...-
Channel: Articles
Yesterday, 01:58 PM -
-
by seqadmin
In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...-
Channel: Articles
01-27-2025, 07:46 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Genetic Mapping of Plasmodium knowlesi Identifies Essential Genes and Drug Resistance Mechanisms
by seqadmin
Started by seqadmin, 02-07-2025, 09:30 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
02-07-2025, 09:30 AM
|
||
Started by seqadmin, 02-05-2025, 10:34 AM
|
0 responses
66 views
0 likes
|
Last Post
by seqadmin
02-05-2025, 10:34 AM
|
||
Started by seqadmin, 02-03-2025, 09:07 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
02-03-2025, 09:07 AM
|
||
Started by seqadmin, 01-31-2025, 08:31 AM
|
0 responses
43 views
0 likes
|
Last Post
by seqadmin
01-31-2025, 08:31 AM
|
Leave a comment: