I've been trying to use FLASH to combine reads that should overlap in read 1 and 2 in a recent MiSeq 2x251 run, but it seems to be missing obvious overlaps.
My data is a pool of four amplicons of different length (roughly 320, 340, 420 and 430 bp long), so I should be getting overlaps from ~70 to 180bp, depending on the amplicon.
When I run FLASH however, it only combines about 30% of the reads. If I look in the files of reads it doesn't combine, I can see reads that should easily be detected - e.g. reads that have 200 bases of 100% identical sequence.
Here's the command I'm using:
flash -m 30 -M 220 -r 251 -f 376 -s 56 file1.fq file2.fq
So -m 30 -M 220 is a minimum overlap size of 30 and maximum overlap of 220 bases (I'm being generous in the size allowance, as my amplicons have a variable length section in the middle). -r is then read length, f is average fragment length, with the standard deviation of s.
If I BLAST some of the paired reads that don't get overlaps found, I can easily find ones with huge overlaps (as well as finding some that did get combined that shouldn't have!).
Any ideas what I might be doing wrong?
My data is a pool of four amplicons of different length (roughly 320, 340, 420 and 430 bp long), so I should be getting overlaps from ~70 to 180bp, depending on the amplicon.
When I run FLASH however, it only combines about 30% of the reads. If I look in the files of reads it doesn't combine, I can see reads that should easily be detected - e.g. reads that have 200 bases of 100% identical sequence.
Here's the command I'm using:
flash -m 30 -M 220 -r 251 -f 376 -s 56 file1.fq file2.fq
So -m 30 -M 220 is a minimum overlap size of 30 and maximum overlap of 220 bases (I'm being generous in the size allowance, as my amplicons have a variable length section in the middle). -r is then read length, f is average fragment length, with the standard deviation of s.
If I BLAST some of the paired reads that don't get overlaps found, I can easily find ones with huge overlaps (as well as finding some that did get combined that shouldn't have!).
Any ideas what I might be doing wrong?
Comment