Header Leaderboard Ad

Collapse

BOWTIE input

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BOWTIE input

    Does bowtie need to have reads in the standard sanger format or can it accept the default file created from the 1.4 illumina pipeline in which the quals are not standard sanger?

    Cheers

    L

  • #2
    If you specify '--phred64-quals' or '--solexa1.3-quals' option you can use thos illumina reads without conversion

    d

    Comment


    • #3
      Just seen it!

      Thanks

      L

      Comment


      • #4
        I try data directlly from solexa without '--phred64-quals' or '--solexa1.3-quals' option. and the output looks well.

        Comment


        • #5
          Originally posted by tujchl View Post
          I try data directlly from solexa without '--phred64-quals' or '--solexa1.3-quals' option. and the output looks well.
          Of course you can, but in that case you're probably estimating base qualities in a wrong way... I guess low quality bases are overestimated by a ~1000x factor...

          Comment


          • #6
            hi dawe
            thank you for you replying, I just have two more questions
            1. what do you mean by "overestimated by a ~1000x factor", could you please explain in detail?
            2. I just test bowtie and it`s my feeling that bowtie do NOT use quality while running. so the quality control could been done before bowtie.
            Thank you in advance

            Comment


            • #7
              Originally posted by tujchl View Post
              hi dawe
              thank you for you replying, I just have two more questions
              1. what do you mean by "overestimated by a ~1000x factor", could you please explain in detail?
              phred-33 and phred-64 scores are different by a 31 offset in ASCII code. As this code is -10log10(p) (plus the offset), a difference in 30 is a difference in 1000x on probability values. The worst illumina score is "@" which means (and correct me if I'm wrong) p = 1. In a Sanger framework 64 is p~0.001 which is 1000x smaller.
              For qualities in the "mid-range" the difference is not relevant.

              Originally posted by tujchl View Post
              2. I just test bowtie and it`s my feeling that bowtie do NOT use quality while running. so the quality control could been done before bowtie.
              That's probably because you have lot of good quality reads, AFAIK bowtie uses qualities (I wonder why Ben included the phred33/phred64 option after all).

              Comment


              • #8
                From looking into Bowtie's defaults --phred33 -quals is "on" and hence assumes you are providing reads in the standard sanger format (phred33). If you are providing data with quality scores in phred64 you should specify --phred64 -quals which is "off" by default. --solexa1.3 -quals is a good option which assumes you are providing unconverted data from the solexa GA 1.3 pipeline or later.

                Alternatively you could use maq to convert the reads from phred64 to phred33 and simply put this through bowtie using bowtie's defaults!

                Hope this helps

                L

                p.s A slight digression - I cannot unzip the hg18 version of the pre-built index h_sapiens_asm.ebwt.zip. I tried both part 1, part 2 and the entire genome, but I get an error saying
                End-of-central-directory signature not found. Either this file is not
                a zipfile, or it constitutes one disk of a multi-part archive.

                Any ideas?

                Comment


                • #9
                  thank you dawe:
                  tow more questions:
                  1. accordding to your words, Can I consider that bowtie indeed ues the quality and filter some reads that can not pass?
                  2. where can I get the ASCII code of phred64 and phred33?

                  and thank Layla for your suggestions and poster this thread
                  I build my human genome index by myself for I don`t have so powerful computer that I build index chr by chr and run chr by chr ........

                  Comment


                  • #10
                    Originally posted by Layla View Post
                    p.s A slight digression - I cannot unzip the hg18 version of the pre-built index h_sapiens_asm.ebwt.zip. I tried both part 1, part 2 and the entire genome, but I get an error saying
                    End-of-central-directory signature not found. Either this file is not
                    a zipfile, or it constitutes one disk of a multi-part archive.

                    Any ideas?
                    Try to index your own genome. I'm dowloading the ebwt right now but it will take more than indexing (at least here...).
                    BTW, you should ask bowtie webmaster the md5sum for the zip files.

                    Comment


                    • #11
                      Originally posted by tujchl View Post
                      1. accordding to your words, Can I consider that bowtie indeed ues the quality and filter some reads that can not pass?
                      You should ask bowtie developers, but AFAIK bowtie doesn't apply quality filters *before* the alignment. Base quality is used at alignment time to score mismatches.

                      Originally posted by tujchl View Post
                      2. where can I get the ASCII code of phred64 and phred33?
                      Code:
                      man ascii
                      look at the decimal set.

                      Comment


                      • #12
                        perl script comparison table

                        Originally posted by tujchl View Post
                        2. where can I get the ASCII code of phred64 and phred33?
                        If you run the perl code below, you'll see a table with a comparison.


                        Code:
                        #!/usr/bin/env perl
                        ################################################
                        # prints a table with phred, ASCII, phred+33, phred+64, p
                        ################################################
                        use strict;
                        use warnings;
                        
                        my @phreds = (0..62);
                        my $step = 2;
                        
                        printf "%6s  %6s  %6s  %6s  %10s\n", 'phred', 'ASCII', 'Ill33', 'Ill64', 'p'; 
                        
                        for (my $i = 0; $i < @phreds; $i+=$step ){
                           my $phred = $phreds[$i];
                           printf "%6d  %6d  %6s  %6s  %10f\n", $phred, $phred+64, chr($phred+33), chr($phred+64), phred2p($phred);
                        }
                        
                        sub phred2p{
                           return 10 ** (-(shift) / 10.0 );
                        }

                        Comment


                        • #13
                          Thank all of you, I learned lots from you.
                          and two more questions:
                          1. when I used data directly from solexa as bowtie input, should I specify "--phred64" or "--solexa1.3" or both?
                          2. when I used option "--concise" to save my disk space and the output is like this
                          1-:<0,2852852,1>
                          and there is 0 other than my ref_index name !!! (I build my ref_index chr by chr and run bowtie chr by chr as well), could you please tell me how to get my ref_index name?
                          (ref_index name such as "chr1" wiil be back if I run bowtie without --concise ).

                          Comment


                          • #14
                            Originally posted by tujchl View Post
                            Thank all of you, I learned lots from you.
                            and two more questions:
                            1. when I used data directly from solexa as bowtie input, should I specify "--phred64" or "--solexa1.3" or both?
                            As stated in the bowtie help

                            Code:
                            --phred64-quals    input quals are Phred+64 (same as --solexa1.3-quals)
                            They are synonyms.

                            Originally posted by tujchl View Post
                            2. when I used option "--concise" to save my disk space and the output is like this
                            1-:<0,2852852,1>
                            Sorry, I can't help. To save space and get valuable information from my results I keep all in BAM format (directly from bowtie output).

                            Comment

                            Working...
                            X