Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to limite the reported read length?

    Hello everybody. Not sure how to address my question. I mean, bowtie will skip reads that are less than 4 characters. How can I make it skip reads less than 5 or 6 or more? Can I set that?

    Thanks very much!

    Comment


    • Originally posted by Ben Langmead View Post
      Hi ramouz,

      I guess I'm not too familiar with how other tools set these fields. Are they typically set by the user? I.e., should I add a command-line option where the user can specify values for ID, SM, etc.?

      Thanks,
      Ben
      Hi Ben
      I'm trying to run some structural variation programs such BreakDancer, Pindel, ... and seems that problem is due to missing info in the header. I guess that there's no other choice than specifing the header manually so would be a good idea to add that option.
      I guess effort should focus on a standard format to avoid compatibility problem.
      Thanks Ben.
      Research Scientist - Bioinformatics
      Sidra Medical and Research Center

      Comment


      • bowtie missed read mapped by blat

        I am confused about this read being missed by bowtie. Did I miss something here

        Here is the blat result on reference sequence
        HTML Code:
        BLASTN 2.2.11 [blat]
        
        Reference:  Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
        
        Query= read
                 (75 letters)
        
        Database: Zv7_scaffolds.fa 
                   7494 sequences; 1,440,319,812 total letters
        
        Searching.done
                                                                         Score    E
        Sequences producing significant alignments:                      (bits) Value
        
        Zv7_scaffold910                                                       132   5e-31
        Zv7_scaffold910                                                       128   9e-30
        ...
        
        >Zv7_scaffold910 
                  Length = 7751464
        
         Score = 132 bits (341), Expect = 5e-31
         Identities = 72/75 (96%)
         Strand = Minus / Plus
        
        Query: 75      agtctgcttttccatataaaactgagaagaagagactgcagccttgaacaaacttgggaa 16
                       ||||||||||||||||||||||||||||||||||||||||||||||| |||||||| |||
        Sbjct: 5660145 agtctgcttttccatataaaactgagaagaagagactgcagccttgatcaaacttgcgaa 5660204
        
        Query: 15      gtcttaacttacacg 1
                       |||| ||||||||||
        Sbjct: 5660205 gtctgaacttacacg 5660219


        While this is what bowtie reports

        HTML Code:
        $ /home/m049157/build/bowtie-0.10.0/bowtie --best -n 3 -p 4 -t zv7scaffold -c CGTGTAAGTTAAGACT
        TCCCAAGTTTGTTCAAGGCTGCAGTCTCTTCTTCTCAGTTTTATATGGAAAAGCAGACT
        Time loading forward index: 00:00:16
        Time loading mirror index: 00:00:16
        Seeded quality full-index search: 00:00:01
        No results
        Time searching: 00:00:33
        Overall time: 00:00:33
        --
        bioinfosm

        Comment


        • <bump>

          any comments?
          --
          bioinfosm

          Comment


          • Originally posted by bioinfosm View Post
            <bump>

            any comments?
            Looks like the -e ceiling is exceeded. When input is provided via -c, qualities are assumed to be 'I' (which, rounded, becomes Phred 30). So try -e 90 or higher.

            Ben

            Comment


            • not the optimum mapping of a read

              Thanks Ben, setting it that way worked.

              But why don't I get the optimum alignment, that is seen using blat! Even using the -a option gives sub-optimal hits..

              Here is the real read with Q values, that ended up unmapped
              @HWI-E4:1:87:1633:1127#0/1
              CGTGTAAGTTAAGACTTCCCAAGTTTGTTCAAGGCTGCAGTCTCTTCTTCTCAGTTTTATATGGAAAAGCAGACT
              +
              B427?@?=@@>07?@ABBBB@?<>AB=4@B?<+/B42@82;A?A<4.,@:6<)9:<8(0998(-/;A=3%%%%%%

              HTML Code:
              $  /home/m049157/build/bowtie-0.10.0/bowtie --best 
              -p 4 -t -e 90 -n 3 zv7scaffold w ww
              Time loading forward index: 00:00:01
              Time loading mirror index: 00:00:01
              Seeded quality full-index search: 00:00:00
              Reported 1 alignments to 1 output stream(s)
              Time searching: 00:00:02
              Overall time: 00:00:02
              
              $ cat ww
              HWI-E4:1:87:1633:1127#0/1       +       Zv7_scaffold1183        139472  CGTGTAAGTTAAGACTTCCCAAGTTTGTTCAAGGCTGCAGTCTCTTCTTCTCAGTTTTATATGGAAAAGCAGACT     B427?@?=@@>07?@ABBBB@?<>AB=4@B?<+/B42@82;A?A<4.,@:6<)9:<8(0998(-/;A=3%%%%%%     0      18:G>C,27:A>T,57:G>T,68:C>G,69:A>C,70:G>A,71:A>G,72:C>A,74:G>T
              
              
              -- removing -n 3
              $  /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -e 90 zv7scaffold w ww
              $ cat ww
              HWI-E4:1:87:1633:1127#0/1       -       Zv7_scaffold724 463363  AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG     %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B     0       18:C>G,27:T>A,57:C>A,68:G>C,69:T>G,70:A>T,71:T>C,72:G>T,74:C>A
              
              
              -- using -a to report all aln
              $  /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -e 90 -a zv7scaffold w ww
              $ cat ww
              HWI-E4:1:87:1633:1127#0/1       -       Zv7_scaffold724 463363  AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG     %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B     0       18:C>G,27:T>A,57:C>A,68:G>C,69:T>G,70:A>T,71:T>C,72:G>T,74:C>A
              HWI-E4:1:87:1633:1127#0/1       +       Zv7_scaffold1183        139472  CGTGTAAGTTAAGACTTCCCAAGTTTGTTCAAGGCTGCAGTCTCTTCTTCTCAGTTTTATATGGAAAAGCAGACT     B427?@?=@@>07?@ABBBB@?<>AB=4@B?<+/B42@82;A?A<4.,@:6<)9:<8(0998(-/;A=3%%%%%%     0      18:G>C,27:A>T,57:G>T,68:C>G,69:A>C,70:G>A,71:A>G,72:C>A,74:G>T
              HWI-E4:1:87:1633:1127#0/1       -       Zv7_scaffold2650        169222  AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG     %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B     0      18:C>G,27:T>A,57:C>A,68:G>C,69:T>G,70:C>T,71:T>C,72:G>T,73:A>G,74:T>A
              
              Thanks,
              Last edited by bioinfosm; 11-30-2009, 01:14 PM.
              --
              bioinfosm

              Comment


              • When I limited my reference sequence to the blat hit region, I got the hit with 3 mis-matches, however, not before I increased the -e option to -e 80. Why would I not get this hit previously, when I used -a -e 90 to report all hits?

                And why do I have to do -n 3, when the seed length by default is 28, and there are no more than 2 mis-matches in 28bp?

                HTML Code:
                $ /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -n 3 -e 80 -a www w ww
                Time loading forward index: 00:00:00
                Time loading mirror index: 00:00:00
                Seeded quality full-index search: 00:00:00
                Reported 1 alignments to 1 output stream(s)
                Time searching: 00:00:00
                Overall time: 00:00:00
                $ cat ww
                HWI-E4:1:87:1633:1127#0/1       -       Zv7_scaffold910 5660144 AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG     %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B       0       10:G>T,18:C>G,27:T>A
                --
                bioinfosm

                Comment


                • Originally posted by bioinfosm View Post
                  When I limited my reference sequence to the blat hit region, I got the hit with 3 mis-matches, however, not before I increased the -e option to -e 80. Why would I not get this hit previously, when I used -a -e 90 to report all hits?

                  And why do I have to do -n 3, when the seed length by default is 28, and there are no more than 2 mis-matches in 28bp?

                  HTML Code:
                  $ /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -n 3 -e 80 -a www w ww
                  Time loading forward index: 00:00:00
                  Time loading mirror index: 00:00:00
                  Seeded quality full-index search: 00:00:00
                  Reported 1 alignments to 1 output stream(s)
                  Time searching: 00:00:00
                  Overall time: 00:00:00
                  $ cat ww
                  HWI-E4:1:87:1633:1127#0/1       -       Zv7_scaffold910 5660144 AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG     %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B       0       10:G>T,18:C>G,27:T>A
                  Hi bioinfosm,

                  Try using the --maxbts or -y options to increase the amount of searching effort put in by Bowtie. Note that -n 2 and -n 3 modes are not fully fully sensitive by default to avoid excessive backtracking (see manual section on Maq-like alignment).

                  That alignment does have 3 mismatches in the seed (at 0-based offsets 10, 18 and 27 from the 5' end).

                  Hope that helps,
                  Ben

                  Comment


                  • Hi,

                    I am confused by the bowtie options again. I used the options "-a --best --strata", but got a result as below:

                    Code:
                    Read1 16      chr1    7947971 255     50M     *       0       0       ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG      IIIIIIIIIIIIIIIIIIII
                    IIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:50 NM:i:0
                    Read1 16      chr12   48275260        255     50M     *       0       0       ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG      IIIIIIIIIIII
                    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:12A7T29    NM:i:2
                    The result shows that there are two hits for this read: one hits to chr1 (where the sequence from) perfectly, and the other hits to chr12 with 2 mismatches. However, my expectation is to make bowtie only report the best hit (namely the hit to chr1) by using the options "-a --best --strata". Why I get this weird result?
                    Thanks in advance.
                    --
                    Xi
                    Xi Wang

                    Comment


                    • I think that is to do with the seed length. For your seed length, are both reads equally good hits!

                      Originally posted by Xi Wang View Post
                      Hi,

                      I am confused by the bowtie options again. I used the options "-a --best --strata", but got a result as below:

                      Code:
                      Read1 16      chr1    7947971 255     50M     *       0       0       ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG      IIIIIIIIIIIIIIIIIIII
                      IIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:50 NM:i:0
                      Read1 16      chr12   48275260        255     50M     *       0       0       ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG      IIIIIIIIIIII
                      IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:12A7T29    NM:i:2
                      The result shows that there are two hits for this read: one hits to chr1 (where the sequence from) perfectly, and the other hits to chr12 with 2 mismatches. However, my expectation is to make bowtie only report the best hit (namely the hit to chr1) by using the options "-a --best --strata". Why I get this weird result?
                      Thanks in advance.
                      --
                      Xi
                      --
                      bioinfosm

                      Comment


                      • Originally posted by Xi Wang View Post
                        Code:
                        Read1 16      chr1    7947971 255     50M     *       0       0       ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG      IIIIIIIIIIIIIIIIIIII
                        IIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:50 NM:i:0
                        Read1 16      chr12   48275260        255     50M     *       0       0       ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG      IIIIIIIIIIII
                        IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:12A7T29    NM:i:2
                        Hi Xi,

                        In -n mode, the "stratum" referred to by --strata is the number of mismatches in the seed. The seed length is set with -l. In your case, the seed doesn't extend to those mismatches.

                        Thanks,
                        Ben

                        Comment


                        • Hi Ben Langmead.
                          Can the resulted .SAM file in "base space" by mapping "color space" reads be used for SNP calling (samtools) or other tools that can be used for dealing with Solexa data? Thanks!

                          Comment


                          • Yes, it should be usable by tools (like samtools) that call SNPs from .sam files.

                            Thanks
                            Ben

                            Comment


                            • Hi Ben:
                              I tried SAMtools on "base space" SAM files generated by aligning "color space" reads with Bowtie. Why I always get "A" in the 3rd column of the pileup file? It seems some kind of errors exists.

                              chr1 1185 g A 0 0 60 1 . !
                              chr1 1190 c A 0 0 60 1 . !
                              chr1 1191 t A 0 0 60 1 . !
                              chr1 1222 c A 0 0 60 2 .. !!
                              chr1 1231 t A 0 0 60 2 .. !!
                              chr1 1232 t A 0 0 60 2 .. !!
                              chr1 1509 c A 0 0 60 1 . !
                              chr1 1511 t A 0 0 60 1 . !
                              chr1 1512 t A 0 0 60 1 .$ !
                              chr1 1850 G A 0 0 60 3 ... !!!
                              chr1 2134 C A 0 0 60 1 . !

                              Is there any problem when calling SNPs from "base space" SAM file producted by aligning "color space" reads? Or the coverage is too low? I just want to try if bowtie -C and samtools works for calling SNPs in color space reads (I converted csfasta and qual files into csfastq file with "solid2fastq" in bfast).

                              some content in SAM file:

                              4_1246_1108 67 chr16 20648174 255 48M = 20650124 1998 CCTCTGGGTTTGTAGATTTGCCACTCTTAAGAGGCAAGGATTGACAGG OOQSQKOJJQECHAE?-=D82.893
                              '!/3!!0=2!!9F?)!!!=?'00 XA:i:0 MD:Z:48 NM:i:0 CM:i:8
                              4_1246_1108 131 chr16 20650125 255 48M = 20648173 -2000 AGTAAGTGGTCATCTATAAAGCAAAGACTGCCTGTGAAATAAATGGGA KEFJQTVSTVUSGIOSE2GJ!!@OO
                              SRNPJI@/ATF("/4:;@ACHG+ XA:i:1 MD:Z:48 NM:i:0 CM:i:3
                              4_1253_1656 179 chr17 66720558 255 48M = 66723289 2779 GACATGCTAAGGAAAGAGTGAAAATGGAGTCATATTAAAATGTTAAGT !&!!!!!:@N"'WSNNKHIMKORHA
                              DMQTTPOULFJMXUUZRRYXYIG XA:i:0 MD:Z:48 NM:i:0 CM:i:7
                              4_1253_1656 115 chr17 66723290 255 48M = 66720557 -2781 TAAAGAAATCTCCAGGCCCAAATGGTTTTACTTGTCAATTCTACCAAA !!!!/8NRJCGPULHBBPI@BLVND
                              AO[QNDK\NLTWVUVNNPNIGTL XA:i:0 MD:Z:48 NM:i:0 CM:i:3
                              4_1254_1557 67 chr1 40359166 255 48M = 40361009 1891 TACTGGACAACACAGTTCTAGTATGTAAGCTTTGAGAGAGCAGGGATT K??CGR>;JFHNL>@OA@GCF94<B
                              OF::;84=@C!!NML4/I;9&(A XA:i:0 MD:Z:48 NM:i:0 CM:i:3
                              4_1254_1557 131 chr1 40361010 255 48M = 40359165 -1893 CCTTTTTCTTGAATAATCTATTTCTTAGTATGTCTTAATTTACTAATA YTVXX[Y\^^^VJPZYMN[YRNLPV
                              NCKUWZUJLSD?;>IIA:FM!!! XA:i:0 MD:Z:48 NM:i:0 CM:i:2

                              all "48M" alignment? Mismatches should be reported Since I used "-C -q -n 2 -l 25 --snpfrac 0.001" to do the bowtie mapping. Can you help me identify my problem? Thanks a lot!
                              Last edited by xuying; 01-12-2010, 03:04 AM.

                              Comment


                              • Hi xuying,

                                Thank you for the detailed report:

                                Originally posted by xuying View Post
                                I tried SAMtools on "base space" SAM files generated by aligning "color space" reads with Bowtie. Why I always get "A" in the 3rd column of the pileup file? It seems some kind of errors exists.

                                chr1 1185 g A 0 0 60 1 . !
                                chr1 1190 c A 0 0 60 1 . !
                                chr1 1191 t A 0 0 60 1 . !
                                chr1 1222 c A 0 0 60 2 .. !!
                                chr1 1231 t A 0 0 60 2 .. !!
                                chr1 1232 t A 0 0 60 2 .. !!
                                chr1 1509 c A 0 0 60 1 . !
                                chr1 1511 t A 0 0 60 1 . !
                                chr1 1512 t A 0 0 60 1 .$ !
                                chr1 1850 G A 0 0 60 3 ... !!!
                                chr1 2134 C A 0 0 60 1 . !

                                Is there any problem when calling SNPs from "base space" SAM file producted by aligning "color space" reads? Or the coverage is too low? I just want to try if bowtie -C and samtools works for calling SNPs in color space reads (I converted csfasta and qual files into csfastq file with "solid2fastq" in bfast).
                                If I understand correctly, that is very low coverage (~2), and the qualities of are also low. Can you send me the fastq file you're using? Also, are you using 0.12.1? Note that versions < 0.12.1 had an issue whereby Bowtie would fail to trim the first color from csfasta reads.

                                Originally posted by xuying View Post
                                some content in SAM file:

                                4_1246_1108 67 chr16 20648174 255 48M = 20650124 1998 CCTCTGGGTTTGTAGATTTGCCACTCTTAAGAGGCAAGGATTGACAGG OOQSQKOJJQECHAE?-=D82.893
                                '!/3!!0=2!!9F?)!!!=?'00 XA:i:0 MD:Z:48 NM:i:0 CM:i:8
                                4_1246_1108 131 chr16 20650125 255 48M = 20648173 -2000 AGTAAGTGGTCATCTATAAAGCAAAGACTGCCTGTGAAATAAATGGGA KEFJQTVSTVUSGIOSE2GJ!!@OO
                                SRNPJI@/ATF("/4:;@ACHG+ XA:i:1 MD:Z:48 NM:i:0 CM:i:3
                                4_1253_1656 179 chr17 66720558 255 48M = 66723289 2779 GACATGCTAAGGAAAGAGTGAAAATGGAGTCATATTAAAATGTTAAGT !&!!!!!:@N"'WSNNKHIMKORHA
                                DMQTTPOULFJMXUUZRRYXYIG XA:i:0 MD:Z:48 NM:i:0 CM:i:7
                                4_1253_1656 115 chr17 66723290 255 48M = 66720557 -2781 TAAAGAAATCTCCAGGCCCAAATGGTTTTACTTGTCAATTCTACCAAA !!!!/8NRJCGPULHBBPI@BLVND
                                AO[QNDK\NLTWVUVNNPNIGTL XA:i:0 MD:Z:48 NM:i:0 CM:i:3
                                4_1254_1557 67 chr1 40359166 255 48M = 40361009 1891 TACTGGACAACACAGTTCTAGTATGTAAGCTTTGAGAGAGCAGGGATT K??CGR>;JFHNL>@OA@GCF94<B
                                OF::;84=@C!!NML4/I;9&(A XA:i:0 MD:Z:48 NM:i:0 CM:i:3
                                4_1254_1557 131 chr1 40361010 255 48M = 40359165 -1893 CCTTTTTCTTGAATAATCTATTTCTTAGTATGTCTTAATTTACTAATA YTVXX[Y\^^^VJPZYMN[YRNLPV
                                NCKUWZUJLSD?;>IIA:FM!!! XA:i:0 MD:Z:48 NM:i:0 CM:i:2

                                all "48M" alignment? Mismatches should be reported Since I used "-C -q -n 2 -l 25 --snpfrac 0.001" to do the bowtie mapping. Can you help me identify my problem? Thanks a lot!
                                In CIGAR, "M" means "either match or mismatch". (See SAM paper). So that output is correct correct.

                                Thanks,
                                Ben

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X