Hi there,
I'm having a look at some FASTQ files generated from the Illumina GA pipeline (I think version 1.3). The data is for a series of paired-end RNA-Seq runs, containing >100,000,000 reads in total.
I'm just trying to get a feel for the information at the moment, and one of the first things I've noticed is that the quality scores are not what I expect.
As I understand it (from the Wikipedia article on FastQ created by Torst), version 1.3+ of the GA pipeline encodes Phred quality scores from 0-62 using ASCII 64-126.
Our files use the characters 66-98, which implies that all our bases have Phred qualities in the range 2 to 34 (inclusive). Also, none of the bases have a quality character of 67, implying no base has a Phred quality of 3 (even though 100,000s of bases have qualities 2 and 4-34).
I'd appreciate it if someone could help answer the following:
(1) does it seem reasonable that our qualities are being capped at 34? I notice a previous post has comments from maubp/kmcarr pointing out that the maximum scores might be capped at 34 by a particular version of Bustard (http://seqanswers.com/forums/showthread.php?t=4679) - perhaps this is what's happening?
(2) is it normal to have a non-zero lower bound for observed quality scores (in our case, 2)?
(3) is there an obvious reason why none of our bases has a quality of 3, even though every other quality in the range 2 to 34 is highly represented?
I'm very new to this area, so I'm not sure what additional information would be helpful here. I should be able to get additional details on request,
Thanks for your time!
I'm having a look at some FASTQ files generated from the Illumina GA pipeline (I think version 1.3). The data is for a series of paired-end RNA-Seq runs, containing >100,000,000 reads in total.
I'm just trying to get a feel for the information at the moment, and one of the first things I've noticed is that the quality scores are not what I expect.
As I understand it (from the Wikipedia article on FastQ created by Torst), version 1.3+ of the GA pipeline encodes Phred quality scores from 0-62 using ASCII 64-126.
Our files use the characters 66-98, which implies that all our bases have Phred qualities in the range 2 to 34 (inclusive). Also, none of the bases have a quality character of 67, implying no base has a Phred quality of 3 (even though 100,000s of bases have qualities 2 and 4-34).
I'd appreciate it if someone could help answer the following:
(1) does it seem reasonable that our qualities are being capped at 34? I notice a previous post has comments from maubp/kmcarr pointing out that the maximum scores might be capped at 34 by a particular version of Bustard (http://seqanswers.com/forums/showthread.php?t=4679) - perhaps this is what's happening?
(2) is it normal to have a non-zero lower bound for observed quality scores (in our case, 2)?
(3) is there an obvious reason why none of our bases has a quality of 3, even though every other quality in the range 2 to 34 is highly represented?
I'm very new to this area, so I'm not sure what additional information would be helpful here. I should be able to get additional details on request,
Thanks for your time!
Comment