Seqanswers Leaderboard Ad

**NicoBxl** · 04-02-2012, 02:26 AM

same question for bwa

**Rocketknight** · 04-02-2012, 03:39 AM

Because a short sequence like 7 bases would map all over the place, it's very unlikely that any read aligner will handle it properly. The algorithms they use are mostly designed to handle sequences no shorter than the shortest reads that come from Illumina sequencers (32bp I think).

The good news is that since you're looking for a relatively small number of specific 7-base sequences without gaps or mismatches, a simple string search should be able to do it for you. A Python or Perl script could just loop over every line in the reference genome and print out any location where it finds one of the matching strings. If you have no idea how to code one, let me know and I'll write you one when I have a few spare minutes.

**yuelics** · 04-04-2012, 04:30 AM

Originally posted by Rocketknight View Post

Because a short sequence like 7 bases would map all over the place, it's very unlikely that any read aligner will handle it properly. The algorithms they use are mostly designed to handle sequences no shorter than the shortest reads that come from Illumina sequencers (32bp I think).

The good news is that since you're looking for a relatively small number of specific 7-base sequences without gaps or mismatches, a simple string search should be able to do it for you. A Python or Perl script could just loop over every line in the reference genome and print out any location where it finds one of the matching strings. If you have no idea how to code one, let me know and I'll write you one when I have a few spare minutes.

Hi Rocketknight,

Thanks a lot for your reply. I actually managed to get Bowtie working on the short 7mer with a few additional options. The tricky thing of writing a script to do it is that the alignment does not need to be exact (i.e. 2 mismatches somewhere in that 7mer are allowed).

**Rocketknight** · 04-04-2012, 05:57 AM

You're going to get a huge amount of matches if you search a large genome with those parameters (by my back-of-the-envelope calculations, a 7bp string with two allowed mismatches will hit by chance more than 0.1% of the time in a statistically average genome). In other words, for a 1GB genome, you should be seeing over one million matches for each 7-mer on average. Does Bowtie really report all of those matches?

Edit: If it doesn't, all isn't lost - it's definitely possible to write a string-searcher with mismatching in Python (though I give no guarantees about running time). I'm willing to help if you're stuck, it sounds like an interesting problem.

Extra edit: Whoops, mistake with my calculations. You should expect a random hit rate as high as about 0.45%. For the mouse genome (~3GB) you should expect to see around 13-14 million hits per 7-mer by chance.

**hanshart** · 03-22-2013, 02:15 AM

Originally posted by Rocketknight View Post

... it's definitely possible to write a string-searcher with mismatching in Python (though I give no guarantees about running time). I'm willing to help if you're stuck, it sounds like an interesting problem.

It's possible to use fqgrep for the approximative sequence search.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

minimal read length accepted by Bowtie

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News