Hi,
I am trying to use bwa to align ~800 million 100bp PE reads from a whole genome dataset generated on the Illumina platform. These are "anomaly" reads which are either unmapped reads, or singletons or discordant pairs by another aligner.
I have the reads in read1.fastq and read2.fastq and called "bwa aln" the standard way:
bwa aln hg19.fa read1.fastq > 1.sai
bwa aln hg19.fa read2.fastq > 2.sai
The problem is that when I checked the progress, it processed every 250K reads in ~10 minutes. This is much, much slower than I expected - at this speed it will take 20+ days for me to get the .sai files.
I had some experience with BWA before on an whole exome dataset and it seemed to be much faster. So I don't know whether this is due to the nature of these reads I started with (the fact that majority of them have failed to be aligned by another aligner or aligned discordantly), or I have made any simple mistake.
Any comment or suggestion will be high appreciated. Thanks!
I am trying to use bwa to align ~800 million 100bp PE reads from a whole genome dataset generated on the Illumina platform. These are "anomaly" reads which are either unmapped reads, or singletons or discordant pairs by another aligner.
I have the reads in read1.fastq and read2.fastq and called "bwa aln" the standard way:
bwa aln hg19.fa read1.fastq > 1.sai
bwa aln hg19.fa read2.fastq > 2.sai
The problem is that when I checked the progress, it processed every 250K reads in ~10 minutes. This is much, much slower than I expected - at this speed it will take 20+ days for me to get the .sai files.
I had some experience with BWA before on an whole exome dataset and it seemed to be much faster. So I don't know whether this is due to the nature of these reads I started with (the fact that majority of them have failed to be aligned by another aligner or aligned discordantly), or I have made any simple mistake.
Any comment or suggestion will be high appreciated. Thanks!
Comment