Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina/Solexa quality values

    Hi everyone,

    I have some Illumina GA fastq files with base quality values that don't span the full range that I expect.

    The quality values for each of five lanes have the following ranges:
    lane 1: 2 to 27
    lane 2: -1 to 26
    lane 3: 1 to 24
    lane 4: 1 to 27
    lane 5: 0 to 30
    with the majority of bases in all lanes having quality values 22 or 23.

    I got the values above by subtracting the offset 64=='@' from the ascii values of the chars presented in the fastq files.

    These ranges don't seem to be consistent with anything I've seen elsewhere. For example, with Solexa quality values I think the range should go from -5 to 40, and for Phred quality values 0 to 40.
    [ Side note: I am not certain whether my files contain Solexa or Phred-based quality values. I see that the quality value output in GERALD fastq files has changed since Illumina pipeline 1.3 (http://seqanswers.com/forums/showthread.php?t=1110). Since lane 2 contains some -1's, I assume my quality values are Solexa ]

    Anyone have any ideas about what could be happening here? Why don't I see any bases with qualities higher than 30?

    Thanks!
    Dan
    ________
    Last edited by d17; 01-19-2011, 01:56 AM.

  • #2
    Originally posted by d17 View Post
    Anyone have any ideas about what could be happening here? Why don't I see any bases with qualities higher than 30?
    Perhaps a problem with the instrument itself? Have you previously had high quality runs, and if so has anything changed with your hardware or software?
    @1
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    +
    """"""""""""""""""""""""""""""""""""

    Comment


    • #3
      Originally posted by TylerBackman View Post
      Perhaps a problem with the instrument itself? Have you previously had high quality runs, and if so has anything changed with your hardware or software?
      Hmm, we have had high quality runs in the past (i.e. quality values from -5 to 40, most bases called as 40). I'll definitely have to check into whether anything has changed with the machine's hardware or software (it's actually not our machine, and these files are a couple of months old now, so that may be hard to track down). I wonder if anyone else has come across quality values that look remotely like these?
      Last edited by d17; 01-19-2011, 01:56 AM.

      Comment


      • #4
        Hi Dan,

        I was just to post on the very same problem. Most of my quality scores are "V"s, which converts to Q22 on the Illumina scale, if I have that correct (new to this). I'd be interested to know if you find an explanation.

        Thanks,

        Dion

        Comment


        • #5
          Originally posted by d17 View Post
          Anyone have any ideas about what could be happening here? Why don't I see any bases with qualities higher than 30?
          For Solexa, the estimated probability of a base call error for Q30 is 0.001. ie. correct with 99.9% probability. This is actually not too bad.

          In our runs, we get similar quality ranges to what you list, although it is rare to get values below 0 - in fact bases called as "N" usually have Q=0 ... which doesn't make much sense to me. Yes, this was GAPipeline 1.0.

          As I suggest, the quality isn't that bad. The reason you aren't seeing higher is almost certainly due to the prep and/or instrument. eg. if you generate too many clusters on the flowcell (high density) you just won't get high confidence in base calls. It's a touchy tradeoff between density/yield and quality/ability to discern clusters.

          Comment


          • #6
            Torst, thanks for your input:

            Originally posted by Torst View Post
            For Solexa, the estimated probability of a base call error for Q30 is 0.001. ie. correct with 99.9% probability. This is actually not too bad.
            Yes, you're absolutely right ... but I would be happier if we had Q40's that were correct with 99.99% probability!

            Originally posted by Torst View Post
            In our runs, we get similar quality ranges to what you list, although it is rare to get values below 0 - in fact bases called as "N" usually have Q=0 ... which doesn't make much sense to me. Yes, this was GAPipeline 1.0.
            One strange thing we have is that bases called as "N" don't always have the same quality value: in the five lanes I posted about the quality values of "N" bases range from -1 to +3. Of course the -1 doesn't make any sense whatsoever, but at least the others are consistent with the base having a low probability of being correct.

            Originally posted by Torst View Post
            The reason you aren't seeing higher is almost certainly due to the prep and/or instrument.
            Does anyone know how much variation in the prep is stochastic? (i.e. Is there a definite problem that I need to hunt down here, or did we just get unlucky compared with previous runs that had higher quality values?)
            ________
            Last edited by d17; 01-19-2011, 01:56 AM.

            Comment


            • #7
              Is your image analysis with IPAR, or with the Illumina pipeline? The first time we used our IPAR unit, it needed "calibration" and resulted in reads with very low quality scores. Re-running the image analysis with firecrest provided higher quality reads.
              @1
              NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
              +
              """"""""""""""""""""""""""""""""""""

              Comment


              • #8
                I'm seeing the exact same thing. I'm seeing quality values from -1 (? or ASCII 63) to 25 (Y or ASCII 89), with most of the calls being 23 (W or ASCII 87). Tyler, how was your IPAR unit `recalibrated' exactly?

                Comment


                • #9
                  Originally posted by sjackman View Post
                  I'm seeing the exact same thing. I'm seeing quality values from -1 (? or ASCII 63) to 25 (Y or ASCII 89), with most of the calls being 23 (W or ASCII 87). Tyler, how was your IPAR unit `recalibrated' exactly?
                  The scores were only incorrect for the first run with the IPAR unit, and were then correct for all subsequent runs.
                  @1
                  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                  +
                  """"""""""""""""""""""""""""""""""""

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    The Impact of AI in Genomic Medicine
                    by seqadmin



                    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                    02-26-2024, 02:07 PM
                  • seqadmin
                    Multiomics Techniques Advancing Disease Research
                    by seqadmin


                    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                    A major leap in the field has
                    ...
                    02-08-2024, 06:33 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:12 AM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-23-2024, 04:11 PM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-21-2024, 08:52 AM
                  0 responses
                  75 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-20-2024, 08:57 AM
                  0 responses
                  66 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X