Seqanswers Leaderboard Ad

**Jessica_L** · 12-09-2015, 02:09 PM

I don't think increasing PhiX above 35% would do that much for you-- according to Illumina, 10% should be sufficient. Is the 35% spike in at least producing an alignment % of approximately the same value?

The Q30 number looks good, but what do the other run stats look like? What's the cluster density, what's the PF rate, how does the FWHM graph look?

The error rate for PhiX, in my experience, usually does not go quite so high when all the other metrics are normal. I have some amplicon data (more base pair diversity than yours so not directly comparable) but my error rates are all still under 1%, even when the % base composition deviates from expected.

Have you been in contact with Illumina tech support at all?

**SunPenguin** · 12-09-2015, 04:22 PM

Hi thank you for your reply. I'll try to get more information when I have access to the sav file tomorrow.

I've been talking to the sequencing core here, and quite frankly they also are not quite sure. I remember them telling me that the cluster density is at around 900K/mm2, which is usually okay, though they thought it may be a bit high for low diversity libraries.

I included the AA trace example of the libraries I was sequencing. the major product is at around 600bp, though there is a minor product at around 450.

The rest of the fastqc is also here. The only other thing that I really didn't like about the sequencing run is how many repeated sequences there are. That's not too surprising, since this is one of the older libraries that probably went through too many PCR cycles. I don't really think that would interfere with PhiX sequencing quality though.

We actually originally aimed for 40% spike-in, but the alignment ended up with 35%.

Attached Files

**Jessica_L** · 12-10-2015, 08:18 AM

It looks like the cluster density is fine, and your PhiX alignment isn't really that far off your target. Plus your sequencing core is probably watching out for anything hardware related that might occur during a run, so the issue probably isn't any of that.

Looking at your (additional) data and comparing it against my amplicon runs, I'm seeing two things:

1) my base pair composition resembles yours when I'm sequencing through a common sequence that exists on literally all of my reads. I'm attaching a screenshot of my base pair composition graph so you can see what I mean. The common region is in the first ~20bp of the run. Once I get through that common region, my base pair composition more closely resembles a more diverse sample.

2) the kmer content graph you posted only has spikes in the first five positions. I don't think I've ever seen that, regardless of the library type. I'm used to seeing spikes across the board. I'm also attaching a screen cap of my kmer content graph so you can see. Apologies for the cropping, but I was trying to keep the image under the 146.5kb limit for the forum attachments.

Anyway, these two things together are leading me to focus more on the low diversity of your sample. The number of amplicons you're sequencing must be very low (also supported by the amount of repeated sequence you're seeing?). It would make sense to me that if you have more amplicon library than PhiX, and the library is extremely low diversity, that the error rates on the PhiX sequence might be inflated quite a lot. It's possible that the amplicon reads are interfering with estimates of phasing and dye crosstalk, and it's pushing up error rates, even if the data isn't necessarily problematic. Have you tried using your amplicon sequence for any down stream analyses? I'd be interested to know if what you're seeing is an artifact or if the data actually has more mismatches in it.

An interesting troubleshooting experiment would be to load a lot of PhiX (upwards of 50 or 60%) with your library and see if that corrects the issue.

Also, as you mentioned these were older libraries and possibly had been overamplified in PCR, are you seeing similar results for runs with newer libraries where the number of PCR cycles has been adjusted down? I expect that might change the amount of overrepresented and/or duplicated sequence, if nothing else.

Attached Files

**bunce** · 12-10-2015, 01:24 PM

Hi SunPenguin. I agree with Jessica it is likely a low complexity issue - but would add that it might be exacerbated by densities above 900k. We have run a lot of amplicon low compexity libraries and some perform better than others (even with a healthy PhiX spike-in). There is a definite 'cliff edge' with these libraries and they can impact of PF% and sequence quality. A couple of things to try. 1) For Amplicons we try and aim for a V2 density of ~700 2) Think about including a single-source sample in your sample set (e.g E.coli DNA if doing 16S bacteria) this will give you an error rate for the amplicon you are generating.

Is the library clean of dimer etc? - I ask because the Q-score drop off is quite abrupt after 100bp. If the library has a lot of short artefacts it might not be helping the situation either.

**SunPenguin** · 12-10-2015, 03:35 PM

Hi all,

Thank you guys for helping. I got my hand on the run file, and you guys were right; there were some funky metrics with the run.

The PF% was only 74%, which is a lot lower than I had thought. The exact density is 923 K/mm2.

The phiX error rate also shoots up after 100bp (though by 100 cycle the error rate is already at 3%). The FWHW is slightly large I think, with the C and T signals drifting from 3.2 to 3.5 throughout the pE run, and A and G from about 2.8 to 3.

I did clean the library by 0.7X spri, but as you can see on the AA trace I uploaded above, there is some 400bp products. I agree that it's weird that it seems the library has a lot of short artifacts (I can see that through the repeated sequence in fastqc as well, which show some adapter sequences).

I have salvaged some data for downstream analysis. I definitely was able to recover some sequences that match back to nblast, but yes the number of reads that I ended up salvaging was low compared to what came off of the sequencer, and the diversity was very low (overall maybe <100,000 unique species).

I'm in the process of sequencing our new and improved library, which should be much more diverse with less PCR cycles and steps. I'm also looking to pool several dissimilar libraries together in addition to PhiX. I'm looking through this data set here to make sure I don't make the same sequencing mistake again...

**thermophile** · 12-11-2015, 07:23 AM

I run lots of amplicons (that's my main thing) and do at least 10% phiX+10% Nextera genomes or if I don't have any genomes, I'll do 20% phiX. Same as bunce, I aim for ~700k clusters. Your pf is technically out of spec so Illumina will likely replace your kit, but they'll also tell you that the problem is your library

**SunPenguin** · 12-14-2015, 05:28 PM

I spoke with our sequencing core today again, and found that the reason for the high cluster density is apparently QC related. They had loaded the lane originally aiming for 600-700k/mm2, but it ended up being 900K/mm2.

The QC was done by qPCR, so I'm really not quite sure what happened there... especially considering that the bioA/AA trace looks very clean.

**cosmarium** · 08-08-2016, 10:30 PM

Hi All,

I have a similar question and I think this post is the best fit for my question (compared to other post I have checked).

I submitted some libraries for sequencing using Miseq 600 V3, PE.
My sample has low diversity
The lenght of my construct is 322 bp
The core thought that doing 250 cycles would be better, my reads are 251 bp long, adapter read-throught type.
The core used 15% PhyX spiked in (which I agreed with).
I got about 17M reads and the quality is relatively good, with 89% or so over Q30.
I saw in the SAV file that the error percentage is 1%
There are important difference between the files of the two reads (number of mistmatches, total number of sequences).

I have mapped (to my known reference) some sequences using Geneious, to have a visual idea of quality. In the attached figures you can see:
1) Lots of darker blue spots, which are mist-matches.
2) There are lots of errors in the adapter region.

I am puzzled whether:
1) It would be ok if I remove the sequences with lots of errors in the adapters and keep the other ones for down stream analysis, or
2) if this really indicates that this data set is not to be trusted for SNPs calling. I mean some reads would have errors in the sequence and not in the adapter, how can I tell?. For long tracks of errors is relatively easy, but for SNPs?

----
EDITION:
Looking at these alignments in more detail, I noticed that the primer binding region, the last part of the alignment (from nucleotide 153 on, 17 nt, not included in the reference) there don't seem to be that many errors. I think the errors in there are the ones created during PCR for library preparation. The ones in the adapter are still a black box to me, why more in there?
----

A bit more detail if you need it:
The reference is a syntethic DNA of 187 bp. I transcribe it, and subject the transcripts to cycles of evolution under different treatments. The sequence data are PCR products obtained from the evolved transcripts. If you want to know more you can check the Continuous in vitro Evolution technique.

Please, let me know if you need more detail.

C

Error 404 (Not Found)!!1

https://drive.google.com/open?id=0B8ehzcvYiIZWVGpYbTdGakZXZUU

Error 404 (Not Found)!!1

https://drive.google.com/open?id=0B8ehzcvYiIZWdTU5NEpPUHFkVWM

Topics	Statistics	Last Post
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, Today, 08:06 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 13 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM

Seqanswers Leaderboard Ad

Announcement

MiSeq Amplified Amplicons sequencing: good Qscore, bad error rate

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News