I tried to run pindel with a bunch of bam files (average size 35GB) against 13M-14M bp region of chromosome 1. It works up to 28 bam files until it runs out of memory (32GB ram on the node). While there are ways to control the memory usage for reference chromosome by binsize (-c) and -w option, I wonder how memory is used when dealing with multiple bam files. By looking at the output log files, it seems like pindel is running on all samples in parallel at the same time. And that is why all the results are merged in single files for all inputs. Apparently, it can't load all the bams all at once (not even a single bam at a time). So it must be dealing with bams segment by segment. But why does it always come to a point it runs out of memory?
I also tried with shorter chromosomes such as #7 and #21 as a whole
with the same 38 samples. Both completed with running into memory
issue within a day. Why is it that pindel works for 38 bams against the whole
chromosome 7 (159M bp) while it does not work on a fragment of
chromosome 1 (10Mbp)?
I also tried with shorter chromosomes such as #7 and #21 as a whole
with the same 38 samples. Both completed with running into memory
issue within a day. Why is it that pindel works for 38 bams against the whole
chromosome 7 (159M bp) while it does not work on a fragment of
chromosome 1 (10Mbp)?
Comment