Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • another solexa/phred question

    I am trying to identify what quality format my reads are in, and I can't seem to find a clear answer online. Thanks for the help:

    @HWI-EAS432:1:1:4:99#0/1
    TCTTATCAGTTTAATATCTGATACGTCATCTATTTGAGTACTATATATTAAATGGATTTT
    +HWI-EAS432:1:1:4:99#0/1
    B>;?A8-(:6=A873=A@/:AA18<7(<6:.6=.036&-=<4:2242:=5/<,&3/?16
    @HWI-EAS432:1:1:4:866#0/1
    GGTTTCGCTAGATAGTAGGTAGGGACAGTGGGAATCTCGTTCATCCATTCATGCGCGTCA
    +HWI-EAS432:1:1:4:866#0/1
    83?CCB@ABB@AC?91?9A6>?:9@B?1>7/-<>98;<2<=;B@B6BB?>);.7(7+5BB
    @HWI-EAS432:1:1:4:844#0/1
    TGCTACCCCTCTATTCTGCCATGGTTAGACCACACCTAGAGTATTGTGTCCAATTCTGGG
    +HWI-EAS432:1:1:4:844#0/1
    @1ABBBBACBBBCBAA=/@BBBA1:A@/@BBBB@<@B@4946B@8:>99A3<=??A%%%%

  • #2
    This FASTQ file is standard Sanger quality encoding, which means take the ASCII value of each character in the quality string and subtract 33 from it. The 'highest' character you have is 'C' == 67 and the lowest is '%' == 37. These would translate to Q scores of 34 and 4 which is an expected range of Phred scores.

    The quickest way to distinguish Sanger Q-score encoding (ASCII-33) from Illumina (Solexa) Q-score encoding (ASCII-64) is to look for numerals [0-9] in the quality string. The numerals have ASCII values from 48-57 so it would be non-sensical to subtract 64 from them. If there are numerals in your quality string then the Q-score encoding is Sanger.
    Last edited by kmcarr; 12-02-2009, 02:23 PM.

    Comment


    • #3
      Got it -- thanks for explaining it so clearly.

      Comment


      • #4
        Got it -- thanks for explaining this so clearly.
        Last edited by crinfante; 12-02-2009, 02:05 PM. Reason: duplicate please delete

        Comment


        • #5
          Hello,

          I have a related issue, I don't know in which FASTQ format my reads are?

          @XXX010005.1 BI:080722_SL-XBE_0007_FC3061LAAXX:6:1:1319:692 length=51
          ACGATGTGACGTACGCGTATGCTCGTATACACACGCATGACGAGCGACGAT
          +XXX010005.1 BI:080722_SL-XBE_0007_FC3061LAAXX:6:1:1319:692 length=51
          IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII@I
          @XXX10005.2 BI:080722_SL-XBE_0007_FC3061LAAXX:6:1:395:487 length=51
          TTTTTCGTGTCGGCGGCCCGTCGCCTCTCCACCCCACCACACCCCCCACCC

          Comment


          • #6
            That would be Illumina/Solexa Fastq;
            But as I can't see the version of the pipeline, it's not possible to tell if this is the new linear fastq metric or the older log-score fastq metric.
            The difference is small, so it shouldn't matter.

            Short question:
            Are all your reads of quality "IIIIII"?
            Strikes me as funny and mayhaps erroneous
            Best
            -Jonathan

            Comment


            • #7
              That's a kind of old Solexa fastq format. (Old being about a year old with this application!) The characters in the quality line ranged from -5 to 40, with ! being 0 and I being 40.

              Fastq format looks like this:

              @read name
              sequence
              +read name again (or just + and nothing, to save file space)
              quality score for each letter in above read

              Comment


              • #8
                Many thanks Jonathan and swbarnes2.

                Yes actually frankly most of my reads are with quality values of IIII!

                So I'm trying to use VAAL in order to assemble several bacterial genomes to a reference and detect SNPs, VAAL requires that I convert these FASTQ files into .fasta and .qual.

                Do you know an easy way of doing that, given that I'm not an expert in bioinformatics?!

                I tried this one:


                But it keeps giving me errors.

                Thanks a lot!

                Comment


                • #9
                  Originally posted by MoBi View Post
                  I tried this one:


                  But it keeps giving me errors.

                  Thanks a lot!
                  I'm pretty sure from the format names etc that this website is using Biopython internally to do the conversion. However, it looks like there is a bug in the website with quotes (which can occur in FASTQ quality strings) being "escaped" with extra slash characters. As a result, the data given to Biopython is corrupted, and the conversion fails.

                  You would be better off using Biopython directly (especially for large files, it would be silly to try and upload/convert/download anything bigger than a few megabytes).

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X