Lately I have been experiencing very slow mapping speeds with Bowtie 2 against a genome containing many ‘Ns’ and was wondering if anyone has experienced the same or know a solution to this.
I have generated some mouse strain genomes containing Ns at known SNP positions to the Black6 reference genome. Running standard 50bp paired-end alignments on 10 cores took more than 10 days to complete for ~200M sequence pairs, which doesn’t sound that it is meant like that. I have since tested a few things, such as the latest version (2.2.0) or the previous one (2.2.1) but that’s not the issue. Also reducing the --score-min parameter didn’t speed it up noticeably. I then took 1M test reads and aligned them to the following 3 genomes:
1) Genome containing 18M Ns, time: ~2h
2) Genome containing 4M Ns, time: 30 mins
3) Black6 reference genome, time: 2 mins
I have noticed that the 3rd index file increased in size from 5858 bytes for Black6, to 156842630 bytes for the N-strain. Is this the index file describing the position of Ns?
Do I just have to accept that Ns in the genome slow Bowtie 2 down >50-fold or is there any known cure for this? I'd be grateful for any pointers. Cheers, Felix
I have generated some mouse strain genomes containing Ns at known SNP positions to the Black6 reference genome. Running standard 50bp paired-end alignments on 10 cores took more than 10 days to complete for ~200M sequence pairs, which doesn’t sound that it is meant like that. I have since tested a few things, such as the latest version (2.2.0) or the previous one (2.2.1) but that’s not the issue. Also reducing the --score-min parameter didn’t speed it up noticeably. I then took 1M test reads and aligned them to the following 3 genomes:
1) Genome containing 18M Ns, time: ~2h
2) Genome containing 4M Ns, time: 30 mins
3) Black6 reference genome, time: 2 mins
I have noticed that the 3rd index file increased in size from 5858 bytes for Black6, to 156842630 bytes for the N-strain. Is this the index file describing the position of Ns?
Do I just have to accept that Ns in the genome slow Bowtie 2 down >50-fold or is there any known cure for this? I'd be grateful for any pointers. Cheers, Felix
Comment