Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • hanshart
    replied
    Originally posted by Rocketknight View Post
    ... it's definitely possible to write a string-searcher with mismatching in Python (though I give no guarantees about running time). I'm willing to help if you're stuck, it sounds like an interesting problem.
    It's possible to use fqgrep for the approximative sequence search.

    Leave a comment:


  • Rocketknight
    replied
    You're going to get a huge amount of matches if you search a large genome with those parameters (by my back-of-the-envelope calculations, a 7bp string with two allowed mismatches will hit by chance more than 0.1% of the time in a statistically average genome). In other words, for a 1GB genome, you should be seeing over one million matches for each 7-mer on average. Does Bowtie really report all of those matches?

    Edit: If it doesn't, all isn't lost - it's definitely possible to write a string-searcher with mismatching in Python (though I give no guarantees about running time). I'm willing to help if you're stuck, it sounds like an interesting problem.

    Extra edit: Whoops, mistake with my calculations. You should expect a random hit rate as high as about 0.45%. For the mouse genome (~3GB) you should expect to see around 13-14 million hits per 7-mer by chance.
    Last edited by Rocketknight; 04-05-2012, 03:28 AM.

    Leave a comment:


  • yuelics
    replied
    Originally posted by Rocketknight View Post
    Because a short sequence like 7 bases would map all over the place, it's very unlikely that any read aligner will handle it properly. The algorithms they use are mostly designed to handle sequences no shorter than the shortest reads that come from Illumina sequencers (32bp I think).

    The good news is that since you're looking for a relatively small number of specific 7-base sequences without gaps or mismatches, a simple string search should be able to do it for you. A Python or Perl script could just loop over every line in the reference genome and print out any location where it finds one of the matching strings. If you have no idea how to code one, let me know and I'll write you one when I have a few spare minutes.
    Hi Rocketknight,

    Thanks a lot for your reply. I actually managed to get Bowtie working on the short 7mer with a few additional options. The tricky thing of writing a script to do it is that the alignment does not need to be exact (i.e. 2 mismatches somewhere in that 7mer are allowed).

    Leave a comment:


  • Rocketknight
    replied
    Because a short sequence like 7 bases would map all over the place, it's very unlikely that any read aligner will handle it properly. The algorithms they use are mostly designed to handle sequences no shorter than the shortest reads that come from Illumina sequencers (32bp I think).

    The good news is that since you're looking for a relatively small number of specific 7-base sequences without gaps or mismatches, a simple string search should be able to do it for you. A Python or Perl script could just loop over every line in the reference genome and print out any location where it finds one of the matching strings. If you have no idea how to code one, let me know and I'll write you one when I have a few spare minutes.

    Leave a comment:


  • NicoBxl
    replied
    same question for bwa

    Leave a comment:


  • yuelics
    started a topic minimal read length accepted by Bowtie

    minimal read length accepted by Bowtie

    Hi all,

    I wonder if anyone knows the minimal read length accepted by Bowtie. Basically, I have a set of short motif sequences (7mers) and want to see where they map to the mouse reference genome. I tried Bowtie, but it seems to not work because of the short read length (7 bp).

    Any suggestions will be very much appreciated!

    Thanks,
    Yue

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 07-25-2024, 06:46 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-24-2024, 11:09 AM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-19-2024, 07:20 AM
0 responses
160 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
127 views
0 likes
Last Post seqadmin  
Working...
X