Hi All,
I have been working on sequencing these PCR amplified amplicons on the MiSeq. I think the final roadblock I'm running into is the sequencing quality/error of these libraries on the MiSeq (2x150).
I should add that these are low complexity libraries. There is built in UMI on read 1 for the first 12 nucleotide, then after that there should be diversity. On read 2, there's about 19 nucleotides that's the same across all the amplicons, and there should be diversity afterwards.
Knowing that the libraries have low diversity, I've been increasing the amount of PhiX spike in (most recently up to 35%).
The run initially looks good: The Qscore looks pretty good, and the per sequence quality is high (I attached the fastqc graphs for read 1, read 2 is worse, but not significantly). Overall %>Q30 is about 89%.
The issue is when the PhiX spike-in is mapped back to the genome, it reported an error rate of 8%. I don't understand why the error rate is so high. I've been told that the mapped error rate is the more believable one.
The other thing I can think of is that the per base percentage content also doesn't look great (attached), which would suggest that even with 35% spike in, there's still not enough diversity on the loaded sample.
Would spiking in more PhiX help with the situation? I haven't been using staggered sequencing primers. How would those work?
I have been working on sequencing these PCR amplified amplicons on the MiSeq. I think the final roadblock I'm running into is the sequencing quality/error of these libraries on the MiSeq (2x150).
I should add that these are low complexity libraries. There is built in UMI on read 1 for the first 12 nucleotide, then after that there should be diversity. On read 2, there's about 19 nucleotides that's the same across all the amplicons, and there should be diversity afterwards.
Knowing that the libraries have low diversity, I've been increasing the amount of PhiX spike in (most recently up to 35%).
The run initially looks good: The Qscore looks pretty good, and the per sequence quality is high (I attached the fastqc graphs for read 1, read 2 is worse, but not significantly). Overall %>Q30 is about 89%.
The issue is when the PhiX spike-in is mapped back to the genome, it reported an error rate of 8%. I don't understand why the error rate is so high. I've been told that the mapped error rate is the more believable one.
The other thing I can think of is that the per base percentage content also doesn't look great (attached), which would suggest that even with 35% spike in, there's still not enough diversity on the loaded sample.
Would spiking in more PhiX help with the situation? I haven't been using staggered sequencing primers. How would those work?
Comment