Hi Guys,
Now I have around 100 sorted .bam files (~150MB each one) generated from TopHat.
Then I need to merge all these .bam files by SAMtools.
I try to enhance the efficiency on Hadoop cloud platform.
There are two choices:
1) Merge all files in one SAMtools merge command (single machine)
2) Merge each pair of two files in the first level, and then merge each pair of two files (from 1st level) in the second level, and ... (the concept similar to merge sort)
My question is:
Whether the second choice can run more faster?
Does anyone try to further enhance the merge efficiency of SAMtools?
Thanks.
Now I have around 100 sorted .bam files (~150MB each one) generated from TopHat.
Then I need to merge all these .bam files by SAMtools.
I try to enhance the efficiency on Hadoop cloud platform.
There are two choices:
1) Merge all files in one SAMtools merge command (single machine)
2) Merge each pair of two files in the first level, and then merge each pair of two files (from 1st level) in the second level, and ... (the concept similar to merge sort)
My question is:
Whether the second choice can run more faster?
Does anyone try to further enhance the merge efficiency of SAMtools?
Thanks.
Comment