Seqanswers Leaderboard Ad

**TiborNagy** · 06-05-2014, 04:37 AM

Try to use -Xms also. If this not help, you need a bigger machine.

**davidlangenberger** · 06-05-2014, 05:01 AM

Well, of course I could give it more memory (up to 2T). But since I don't think that the huge memory consumption is a feature, I assume it is a user error, or a bug.

**Brian Bushnell** · 06-05-2014, 09:34 AM

A sorted, compressed bam file can have extremely high compression, and Java is not very memory-efficient particularly when working with Strings, which appear to be used there. So I would just run it with all available memory (setting the -Xmx flag at around 85% of physical RAM). But in your case it looks like the program may have actually completed:

[Wed Jun 04 22:54:26 CEST 2014] picard.sam.EstimateLibraryComplexity done.
Elapsed time: 71.30 minutes

...and then crashed, possibly while generating some kind of output, which sounds like a bug in the program.

If it still does not work when you give it more RAM, I have a program that will estimate library complexity that you could try, invoked by the shellscript "bbcountunique.sh", available at my BBMap website.

bbcountunique.sh -Xmx100g in=reads.fq out=results.txt

It's very memory-efficient, as it does not store Strings, just numeric kmers. And it does not use mapping information, just the raw sequence. So it's designed for fastq or fasta input, but it still works on sam input and should work on a bam file if samtools is installed.

The output is a histogram with the percentage of reads that are unique, every 25000 reads (you can adjust that number with the 'interval' flag). This is calculated based on whether kmers have been seen before, using the read's first kmer and a random kmer. k is by default 20. So, you can plot the histogram to observe the library's complexity; we run this on all our of our data.

For paired data, it's best to use it with fastq or fasta, though, because then you also get information about unique pairs rather than just unique reads.

Topics	Statistics	Last Post
Genetic Mosaicism More Prevalent Than Previously Thought by seqadmin Started by seqadmin, 05-30-2024, 03:16 PM	0 responses 18 views 0 likes	Last Post by seqadmin 05-30-2024, 03:16 PM
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability by seqadmin Started by seqadmin, 05-29-2024, 01:32 PM	0 responses 18 views 0 likes	Last Post by seqadmin 05-29-2024, 01:32 PM
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, 05-24-2024, 07:15 AM	0 responses 209 views 0 likes	Last Post by seqadmin 05-24-2024, 07:15 AM
Catalog of Gene-Isoform Variation in Developing Human Brain by seqadmin Started by seqadmin, 05-23-2024, 10:28 AM	0 responses 225 views 0 likes	Last Post by seqadmin 05-23-2024, 10:28 AM

Seqanswers Leaderboard Ad

Announcement

Picard: EstimateLibraryComplexity -> OutOfMemoryError

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News