Any tips for the quickest way to get TopHat to complete alignment of a very large dataset (~1.2 billion 75bp Single end HiSeq reads)? I considered splitting it into 10 but then realized this would risk treating identical reads in the ten subsets as unique in the final file. I'm looking to keep anything that maps up to 10 locations, with up to three mismatches and three bp indels:
.
Thanks,
Shurjo
Code:
-p 16 -g 10 --read-mismatches 3 --read-gap-length 3 --read-edit-dist 3
Thanks,
Shurjo
Comment