Hi,
My name is Ann, and I'm an old Sanger sequencer getting used to the NGS world. I used to do radioactive (S-35 and P-33) sequencing, ~3-5 days hands on for 750 bp, so the switch to capillaries was incredible then and the switch to millions of reads makes my younger self green with envy. I've always been a mix of molecular biologist with super-user: I build my own Illumina libraries but send them off to a sequencing core and I can work in the Galaxy Web based interface and can explain what I want to a programmer but I cannot program myself. (I tried. I took a series bioinformatics courses including C++and Perl but I'm so slow at programming that I could calculate an e-value by hand faster than I could code it.)
Enough about me. I have several questions.
1) I'm curious why Phred values don't seem to be used by de novo NGS sequencing assemblers, or have I just not read the documentation correctly? They seem only to be used to evaluate whether a MiSeq or HiSeq run was good and not to mask or trim individual bases that aren't good. It seems weird to me to throw out an entire read if only few bases are bad.
2) What are the average %>Q30 for MiSeq and HiSeqs? Normal ranges?
I realize that longer reads have declining Q30s, but on my first 2 MiSeq runs, as quality controls before HiSeq runs, at 2 x 150s with 96 combinations of dual index adapters, the core told me I had Read 1 96%>Q30 and Read 4 (Read 2 sequencing primer 2) 92%>Q30 with ~12 million paired end reads PF. On my second run, with 1 index, I had 2 x 250 at about the same # of reads ~94% and ~90% respectively. They said those values were high, I'm just trying to get a feel for how high. The Sequence Analysis Viewer shows that most of the cycles are well above Q35 with the 2 index sets (8 cycles each index Read 2 and 3) in the middle dropping to Q30.
Thanks in advance,
Ann
My name is Ann, and I'm an old Sanger sequencer getting used to the NGS world. I used to do radioactive (S-35 and P-33) sequencing, ~3-5 days hands on for 750 bp, so the switch to capillaries was incredible then and the switch to millions of reads makes my younger self green with envy. I've always been a mix of molecular biologist with super-user: I build my own Illumina libraries but send them off to a sequencing core and I can work in the Galaxy Web based interface and can explain what I want to a programmer but I cannot program myself. (I tried. I took a series bioinformatics courses including C++and Perl but I'm so slow at programming that I could calculate an e-value by hand faster than I could code it.)
Enough about me. I have several questions.
1) I'm curious why Phred values don't seem to be used by de novo NGS sequencing assemblers, or have I just not read the documentation correctly? They seem only to be used to evaluate whether a MiSeq or HiSeq run was good and not to mask or trim individual bases that aren't good. It seems weird to me to throw out an entire read if only few bases are bad.
2) What are the average %>Q30 for MiSeq and HiSeqs? Normal ranges?
I realize that longer reads have declining Q30s, but on my first 2 MiSeq runs, as quality controls before HiSeq runs, at 2 x 150s with 96 combinations of dual index adapters, the core told me I had Read 1 96%>Q30 and Read 4 (Read 2 sequencing primer 2) 92%>Q30 with ~12 million paired end reads PF. On my second run, with 1 index, I had 2 x 250 at about the same # of reads ~94% and ~90% respectively. They said those values were high, I'm just trying to get a feel for how high. The Sequence Analysis Viewer shows that most of the cycles are well above Q35 with the 2 index sets (8 cycles each index Read 2 and 3) in the middle dropping to Q30.
Thanks in advance,
Ann
Comment