Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by jkbonfield View Post
    Do they still emit varying quality values for N bases? That always confused me. Most were 4 I think, but we'd occasionally see N with quality all the way up to 10. I can only assume they change bases to N at some stage, but don't do anything with the Q value. It seemed broken at the time anyway, but maybe it's a bit saner now.
    I agree with your experience. I never understood what an "N" of quality "10" actually meant; Q10 means probability of error is 10% so does that mean there is a 90% chance it is not an "N" ? :-)

    I just wrote a quick Perl script to check how N is being qualitied on a recent Pipeline 1.6 for the first 2M reads of a random fastq file from the run (QVALUE => FREQUENCY):

    '6' => 7,
    '11' => 22,
    '7' => 57,
    '9' => 80,
    '12' => 18,
    '2' => 281517,
    '15' => 1,
    '14' => 5,
    '8' => 62,
    '4' => 51799,
    '13' => 3,
    '10' => 23,
    '5' => 72

    As you can see, most are Q02, which is "B" and is part of the 'rejected section' of the read, so they can be ignored. Most of true Ns are Q4 ("D") as they were in your experience, however there are still smatterings of Ns with qualities all the way up to Q15 !

    *sigh*

    Comment


    • #17
      Quality score of -54(10)

      Before mapping and before subtracting 64, I checked the distribution of quality scores for my reads (PIPELINE 1.6). I noticed what everyone mentioned here (quality scores starting at 66 - 64 = 2).

      However, I also noticed thousands of quality scores of 10 - 64 = -54. I thought negative quality scores were "phased out" according to the Wiki? What are these? More importantly, do they say anything about run quality? One end of my paired-end run has more -54 quality bases in the second end for every lane, what does that mean?

      Second question, do any of the current mapping programs (Bowtie, BWA, BFAST, SOAP, etc) automatically do end-clipping of "B" quality bases at ends of reads? I am guessing that the -54 scores are converted to zero.

      Cheers,
      Juan

      Comment


      • #18
        Solexa's negative quality scores only went down to -5, so something else is going on.

        Could you post a couple of reads with these funny quality scores? Wrap it in [ code ] and [ /code ] tags for display in the forum.

        Comment


        • #19
          Originally posted by jkbonfield View Post
          Do they still emit varying quality values for N bases?

          That always confused me. Most were 4 I think, but we'd occasionally see N with quality all the way up to 10. I can only assume they change bases to N at some stage, but don't do anything with the Q value. It seemed broken at the time anyway, but maybe it's a bit saner now.

          If I can understand what they have done here? -- they take low scoring bases and convert them to N (rather than calling the highest signal with a low score) ? -- when you align these reads are the N's counted as errors ?? or ignored ??

          Comment


          • #20
            [QUOTE=maubp;25709]Solexa's negative quality scores only went down to -5, so something else is going on.

            I figured it out. 10 is the ASCII code for newline. bug in code not bizarre quality score.

            Comment


            • #21
              Isolated B's

              Hi, hopefully I can revive this old thread for a little. I just got a big dataset from a HiSeq2000 machine at Berkeley, I'm not sure which version of the Illumina pipeline was used, but I do see single "B" qualities in some reads
              i.e.


              @HS1_0077:5:1101:1205:2082#0/1
              NCCCCAAAGCATGATGTTTCCACCCCCATGCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCAGACCGATATCGTATGCCGTCTTCCGC
              +HS1_0077:5:1101:1205:2082#0/1
              BTSTTV[VYYc_ac_cccccccc[YUYV_cBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
              @HS1_0077:5:1101:1231:2094#0/1
              NTGTGGTATATATGCATGTAGTTACTTGGCCANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTCTCCACGATCTCCACACACACCCTCT
              +HS1_0077:5:1101:1231:2094#0/1
              BUPNUSSUUUcccac_c_ccccc_ccc_caBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB


              SamH

              Comment


              • #22
                As discussed earlier in this thread, you can have lone B qualities (PHRED 2) in the quality string, and a trailing block of B markers as well. The second example here specifically copes this:

                Comment


                • #23
                  Hi:

                  I have a question about the so-called Sanger format where Q can be in [0;93] represented by [!-~]. Since this is the set of all visible ASCII characters, then it looks like there is no symbol reserved for a missing value. Does it mean that specifying missing values for the quality of individual base pairs is impossible? If yes, why?

                  Comment


                  • #24
                    Originally posted by NikTuzov View Post
                    Hi:

                    I have a question about the so-called Sanger format where Q can be in [0;93] represented by [!-~]. Since this is the set of all visible ASCII characters, then it looks like there is no symbol reserved for a missing value. Does it mean that specifying missing values for the quality of individual base pairs is impossible? If yes, why?
                    Yes, although in this case you could use PHRED quality 0 (and I recall some tools may use the upper bound 93 as a special value).

                    Why? None of the early technologies needed a missing value quality score.

                    Comment


                    • #25
                      Originally posted by maubp View Post
                      Yes, although in this case you could use PHRED quality 0 (and I recall some tools may use the upper bound 93 as a special value).
                      Thanks. I take it the encoding of missing values depends on the sequencer and it cannot be changed downstream. Therefore, I don't see how some downstream tool can use 93 as a missing value when it is not seen as such by the sequencing instrument.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Latest Developments in Precision Medicine
                        by seqadmin



                        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                        Somatic Genomics
                        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                        05-24-2024, 01:16 PM
                      • seqadmin
                        Recent Advances in Sequencing Analysis Tools
                        by seqadmin


                        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                        05-06-2024, 07:48 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 05-24-2024, 07:15 AM
                      0 responses
                      198 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-23-2024, 10:28 AM
                      0 responses
                      219 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-23-2024, 07:35 AM
                      0 responses
                      228 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-22-2024, 02:06 PM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X