Hi all, I was given a set of BAM files (100 Gb each) that I would like to realign using bwa. Problem is, I would like to have the best speed for doing this.
At the moment start from chunked fastq files and send a single bwa alignment for each node of my cluster, I achieve good parallelism and speed.
When starting from BAM files I have two options:
1- convert BAM to fastq -> split fastq in chunks -> align in the same way
2- feed bwa with BAM files
If I go for (2) I cannot really parallelize the whole process, unless I can split bam files into chunks which must contain both pairs for each fragment. The only way to do this, I guess, is to sort by read name my BAM files and then split. I don't have an idea about the time required and the space for the newly sorted file
If I go for (1) I can use picard SamToFastq but it takes ~25s every 100k reads to convert... each of my BAM files contains 130M reads, it would take more than a week only to convert.
Does anybody want to spend two cents on this with an advice?
thanks
d
At the moment start from chunked fastq files and send a single bwa alignment for each node of my cluster, I achieve good parallelism and speed.
When starting from BAM files I have two options:
1- convert BAM to fastq -> split fastq in chunks -> align in the same way
2- feed bwa with BAM files
If I go for (2) I cannot really parallelize the whole process, unless I can split bam files into chunks which must contain both pairs for each fragment. The only way to do this, I guess, is to sort by read name my BAM files and then split. I don't have an idea about the time required and the space for the newly sorted file
If I go for (1) I can use picard SamToFastq but it takes ~25s every 100k reads to convert... each of my BAM files contains 130M reads, it would take more than a week only to convert.
Does anybody want to spend two cents on this with an advice?
thanks
d
Comment