I have 300 bp paired-end Illumina reads generated on the MiSeq using Illumina's V3V4 16S protocol. The amplicon size is 460 bp.
As the first step in my analysis, I'm using FLASH to merge these reads. I'm using the following command line:
FLASH --min-overlap=20 --max-overlap= 140 --read-len=300 --fragment-len=460 --fragment-len-stddev=1 --output-directory=MERGED --output-prefix=MERGED 612A-plate-1-H04_S88_L001_R1_001.fastq 612A-plate-1-H04_S88_L001_R2_001.fastq
After FLASH completes, it gives the following warning:
[FLASH] WARNING: An unexpectedly high proportion of combined pairs (62.47%) overlapped by more than 140 bp, the --max-overlap (-M) parameter. Considerincreasing this parameter. (As-is, FLASH is penalizing overlaps longer than 140 bp when considering them for possible combining!)
Since the theoretical max overlap should be 140 bp, that's what I set the max-overlap parameter to. How is it possible that so many reads overlap significantly more than 140 bp? Running a few iterations of this, I have found that I have to set 'max-overlap' at 159 to eliminate this error.
Just trying to understand how this parameter actually works. Maybe my amplicon is a little smaller than expected?
EDIT: I just realized that I'm using both the 'read-len'/'fragment-len'/'fragment-len-stddev' parameters together with 'max-overlap' above, so the first three are ignored. If I use them without 'max-overlap', the calculated max-overlap is 152. I used 'max-overlap' to determine that 159 eliminates the warning.
As the first step in my analysis, I'm using FLASH to merge these reads. I'm using the following command line:
FLASH --min-overlap=20 --max-overlap= 140 --read-len=300 --fragment-len=460 --fragment-len-stddev=1 --output-directory=MERGED --output-prefix=MERGED 612A-plate-1-H04_S88_L001_R1_001.fastq 612A-plate-1-H04_S88_L001_R2_001.fastq
After FLASH completes, it gives the following warning:
[FLASH] WARNING: An unexpectedly high proportion of combined pairs (62.47%) overlapped by more than 140 bp, the --max-overlap (-M) parameter. Considerincreasing this parameter. (As-is, FLASH is penalizing overlaps longer than 140 bp when considering them for possible combining!)
Since the theoretical max overlap should be 140 bp, that's what I set the max-overlap parameter to. How is it possible that so many reads overlap significantly more than 140 bp? Running a few iterations of this, I have found that I have to set 'max-overlap' at 159 to eliminate this error.
Just trying to understand how this parameter actually works. Maybe my amplicon is a little smaller than expected?
EDIT: I just realized that I'm using both the 'read-len'/'fragment-len'/'fragment-len-stddev' parameters together with 'max-overlap' above, so the first three are ignored. If I use them without 'max-overlap', the calculated max-overlap is 152. I used 'max-overlap' to determine that 159 eliminates the warning.
Comment