I wonder if someone with more intimate knowledge of the Solexa pipeline could shed some light on the different varieties of quality scores produced and how they relate to one another. Just to be clear, I'm not referring to the difference b/n Solexa and Phred scores or conversion to ascii. From my limited knowledge, there appear to be at least two types of Q-scores produced by the pipeline: intensity-based (found in .prb files from Bustard) and alignment based (found in fastq files from Gerald). There also seems to be some kind of quality calibration going on (using a "precalculated calibration table"?).
To give some context, I am working with paired-end reads from a bacterial genome using the v1.3 pipeline. I am finding the fastq quality scores are much lower than those from the .prb files (almost entirely Q22 compared to Q40). I'm wondering which scores better represent the quality and why Q22 would be so over-represented in the fastq.
Thanks!
BTW, here is a snippet of my fastq file in case I my interpretation is wrong:
@Paired_run:7:1:305:1931/1
GAAATAGATGAAGATTTAATTATTGCTCCTAAAT
+Paired_run:7:1:305:1931/1
VVVVVVVVVVVVVVVVVVVVVVVVUVVVVVUUUU
@Paired_run:7:1:315:1920/1
GACTAAACTGTAGCAATGGTTTAAATGATGATCT
+Paired_run:7:1:315:1920/1
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVUUUUU
@Paired_run:7:1:341:1932/1
GCTAATGATGTTCTTGATAATTTAAACAAAATTG
+Paired_run:7:1:341:1932/1
VVVVVVVVVVVVVVVUVVVVVVVVVVVVVVUUUS
@Paired_run:7:1:302:1939/1
GAAATAGATGAAGATTTAATTATTGCTCCTAAAT
+Paired_run:7:1:302:1939/1
VVVVVVVVVVVVVVVVVVVVVVVVUVVVVVUUUU
@Paired_run:7:1:212:1540/1
GTTAGAATTAATCAAATTGTATGGATGTGTGTAG
+Paired_run:7:1:212:1540/1
VUVVVVVVVVVVVVVVVVVUVVUUVVRVSVRUUS
@Paired_run:7:1:173:757/1
GTAGACGTATCAGGAGTTTCTAAAGGTAAGGGAT
+Paired_run:7:1:173:757/1
VVVVVVVVVVVUVVVVVVVVVVVVVUVVVVUUUU
To give some context, I am working with paired-end reads from a bacterial genome using the v1.3 pipeline. I am finding the fastq quality scores are much lower than those from the .prb files (almost entirely Q22 compared to Q40). I'm wondering which scores better represent the quality and why Q22 would be so over-represented in the fastq.
Thanks!
BTW, here is a snippet of my fastq file in case I my interpretation is wrong:
@Paired_run:7:1:305:1931/1
GAAATAGATGAAGATTTAATTATTGCTCCTAAAT
+Paired_run:7:1:305:1931/1
VVVVVVVVVVVVVVVVVVVVVVVVUVVVVVUUUU
@Paired_run:7:1:315:1920/1
GACTAAACTGTAGCAATGGTTTAAATGATGATCT
+Paired_run:7:1:315:1920/1
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVUUUUU
@Paired_run:7:1:341:1932/1
GCTAATGATGTTCTTGATAATTTAAACAAAATTG
+Paired_run:7:1:341:1932/1
VVVVVVVVVVVVVVVUVVVVVVVVVVVVVVUUUS
@Paired_run:7:1:302:1939/1
GAAATAGATGAAGATTTAATTATTGCTCCTAAAT
+Paired_run:7:1:302:1939/1
VVVVVVVVVVVVVVVVVVVVVVVVUVVVVVUUUU
@Paired_run:7:1:212:1540/1
GTTAGAATTAATCAAATTGTATGGATGTGTGTAG
+Paired_run:7:1:212:1540/1
VUVVVVVVVVVVVVVVVVVUVVUUVVRVSVRUUS
@Paired_run:7:1:173:757/1
GTAGACGTATCAGGAGTTTCTAAAGGTAAGGGAT
+Paired_run:7:1:173:757/1
VVVVVVVVVVVUVVVVVVVVVVVVVUVVVVUUUU
Comment