Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Layla
    Member
    • Sep 2008
    • 58

    BOWTIE input

    Does bowtie need to have reads in the standard sanger format or can it accept the default file created from the 1.4 illumina pipeline in which the quals are not standard sanger?

    Cheers

    L
  • dawe
    Senior Member
    • Apr 2009
    • 258

    #2
    If you specify '--phred64-quals' or '--solexa1.3-quals' option you can use thos illumina reads without conversion

    d

    Comment

    • Layla
      Member
      • Sep 2008
      • 58

      #3
      Just seen it!

      Thanks

      L

      Comment

      • tujchl
        Member
        • Sep 2009
        • 74

        #4
        I try data directlly from solexa without '--phred64-quals' or '--solexa1.3-quals' option. and the output looks well.

        Comment

        • dawe
          Senior Member
          • Apr 2009
          • 258

          #5
          Originally posted by tujchl View Post
          I try data directlly from solexa without '--phred64-quals' or '--solexa1.3-quals' option. and the output looks well.
          Of course you can, but in that case you're probably estimating base qualities in a wrong way... I guess low quality bases are overestimated by a ~1000x factor...

          Comment

          • tujchl
            Member
            • Sep 2009
            • 74

            #6
            hi dawe
            thank you for you replying, I just have two more questions
            1. what do you mean by "overestimated by a ~1000x factor", could you please explain in detail?
            2. I just test bowtie and it`s my feeling that bowtie do NOT use quality while running. so the quality control could been done before bowtie.
            Thank you in advance

            Comment

            • dawe
              Senior Member
              • Apr 2009
              • 258

              #7
              Originally posted by tujchl View Post
              hi dawe
              thank you for you replying, I just have two more questions
              1. what do you mean by "overestimated by a ~1000x factor", could you please explain in detail?
              phred-33 and phred-64 scores are different by a 31 offset in ASCII code. As this code is -10log10(p) (plus the offset), a difference in 30 is a difference in 1000x on probability values. The worst illumina score is "@" which means (and correct me if I'm wrong) p = 1. In a Sanger framework 64 is p~0.001 which is 1000x smaller.
              For qualities in the "mid-range" the difference is not relevant.

              Originally posted by tujchl View Post
              2. I just test bowtie and it`s my feeling that bowtie do NOT use quality while running. so the quality control could been done before bowtie.
              That's probably because you have lot of good quality reads, AFAIK bowtie uses qualities (I wonder why Ben included the phred33/phred64 option after all).

              Comment

              • Layla
                Member
                • Sep 2008
                • 58

                #8
                From looking into Bowtie's defaults --phred33 -quals is "on" and hence assumes you are providing reads in the standard sanger format (phred33). If you are providing data with quality scores in phred64 you should specify --phred64 -quals which is "off" by default. --solexa1.3 -quals is a good option which assumes you are providing unconverted data from the solexa GA 1.3 pipeline or later.

                Alternatively you could use maq to convert the reads from phred64 to phred33 and simply put this through bowtie using bowtie's defaults!

                Hope this helps

                L

                p.s A slight digression - I cannot unzip the hg18 version of the pre-built index h_sapiens_asm.ebwt.zip. I tried both part 1, part 2 and the entire genome, but I get an error saying
                End-of-central-directory signature not found. Either this file is not
                a zipfile, or it constitutes one disk of a multi-part archive.

                Any ideas?

                Comment

                • tujchl
                  Member
                  • Sep 2009
                  • 74

                  #9
                  thank you dawe:
                  tow more questions:
                  1. accordding to your words, Can I consider that bowtie indeed ues the quality and filter some reads that can not pass?
                  2. where can I get the ASCII code of phred64 and phred33?

                  and thank Layla for your suggestions and poster this thread
                  I build my human genome index by myself for I don`t have so powerful computer that I build index chr by chr and run chr by chr ........

                  Comment

                  • dawe
                    Senior Member
                    • Apr 2009
                    • 258

                    #10
                    Originally posted by Layla View Post
                    p.s A slight digression - I cannot unzip the hg18 version of the pre-built index h_sapiens_asm.ebwt.zip. I tried both part 1, part 2 and the entire genome, but I get an error saying
                    End-of-central-directory signature not found. Either this file is not
                    a zipfile, or it constitutes one disk of a multi-part archive.

                    Any ideas?
                    Try to index your own genome. I'm dowloading the ebwt right now but it will take more than indexing (at least here...).
                    BTW, you should ask bowtie webmaster the md5sum for the zip files.

                    Comment

                    • dawe
                      Senior Member
                      • Apr 2009
                      • 258

                      #11
                      Originally posted by tujchl View Post
                      1. accordding to your words, Can I consider that bowtie indeed ues the quality and filter some reads that can not pass?
                      You should ask bowtie developers, but AFAIK bowtie doesn't apply quality filters *before* the alignment. Base quality is used at alignment time to score mismatches.

                      Originally posted by tujchl View Post
                      2. where can I get the ASCII code of phred64 and phred33?
                      Code:
                      man ascii
                      look at the decimal set.

                      Comment

                      • svl
                        Member
                        • Sep 2009
                        • 43

                        #12
                        perl script comparison table

                        Originally posted by tujchl View Post
                        2. where can I get the ASCII code of phred64 and phred33?
                        If you run the perl code below, you'll see a table with a comparison.


                        Code:
                        #!/usr/bin/env perl
                        ################################################
                        # prints a table with phred, ASCII, phred+33, phred+64, p
                        ################################################
                        use strict;
                        use warnings;
                        
                        my @phreds = (0..62);
                        my $step = 2;
                        
                        printf "%6s  %6s  %6s  %6s  %10s\n", 'phred', 'ASCII', 'Ill33', 'Ill64', 'p'; 
                        
                        for (my $i = 0; $i < @phreds; $i+=$step ){
                           my $phred = $phreds[$i];
                           printf "%6d  %6d  %6s  %6s  %10f\n", $phred, $phred+64, chr($phred+33), chr($phred+64), phred2p($phred);
                        }
                        
                        sub phred2p{
                           return 10 ** (-(shift) / 10.0 );
                        }

                        Comment

                        • tujchl
                          Member
                          • Sep 2009
                          • 74

                          #13
                          Thank all of you, I learned lots from you.
                          and two more questions:
                          1. when I used data directly from solexa as bowtie input, should I specify "--phred64" or "--solexa1.3" or both?
                          2. when I used option "--concise" to save my disk space and the output is like this
                          1-:<0,2852852,1>
                          and there is 0 other than my ref_index name !!! (I build my ref_index chr by chr and run bowtie chr by chr as well), could you please tell me how to get my ref_index name?
                          (ref_index name such as "chr1" wiil be back if I run bowtie without --concise ).

                          Comment

                          • dawe
                            Senior Member
                            • Apr 2009
                            • 258

                            #14
                            Originally posted by tujchl View Post
                            Thank all of you, I learned lots from you.
                            and two more questions:
                            1. when I used data directly from solexa as bowtie input, should I specify "--phred64" or "--solexa1.3" or both?
                            As stated in the bowtie help

                            Code:
                            --phred64-quals    input quals are Phred+64 (same as --solexa1.3-quals)
                            They are synonyms.

                            Originally posted by tujchl View Post
                            2. when I used option "--concise" to save my disk space and the output is like this
                            1-:<0,2852852,1>
                            Sorry, I can't help. To save space and get valuable information from my results I keep all in BAM format (directly from bowtie output).

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              06-02-2026, 10:05 AM
                            • SEQadmin2
                              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                              by SEQadmin2


                              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                              Introduction

                              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                              05-22-2026, 06:42 AM
                            • SEQadmin2
                              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                              by SEQadmin2

                              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                              05-06-2026, 09:04 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Today, 08:59 AM
                            0 responses
                            7 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 12:03 PM
                            0 responses
                            21 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 11:40 AM
                            0 responses
                            14 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-28-2026, 11:40 AM
                            0 responses
                            29 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...