I am testing different aligners for mapping single-end 15bp length reads back to genome. For SHRiMP, What would be an appropriate parameter setting? The parameter system seems much more complicated than other aligner like bowtie, bwa, and I did not find parameter to set seed_length.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
15bp single-end reads are too short: mathematically, it is impossible to find the location where they originate from in the human genome. This has nothing to do with the choice of the mapper. Here is why.
Given a fixed 15bp read, assuming the reference is uniform random string (for math purposes), a perfect (15 matches) random hit will occur at any one location w.p. 4^-15. However, the length of the human genome is about 3*4^15. Hence, you expect 3 perfect random hits per read. You have no mathematical chance of distinguishing between those and the location where the read really originates from.
Moreover, the set of random hits grows a lot the moment you allow any polymorphisms. E.g., suppose you allow for a single SNP. The probability the read matches a random string with exactly one mismatch is 15*(1/3)*(1/4^14) = 5*4^-14. Since there are 3*4^15 locations in the genome, you expect 5*3*4=60 random hits at a distance of 1 SNP from your read.
With the above in mind, you cannot map 15bp single-end reads back to the human genome and hope to find where they originate from (on average). The best you can hope for is a list of possible locations, but as explained above, the list will be quite large the moment you allow as little as 1 SNP.
Mappers based on spaced seeds (SHRiMP, BFAST) beat mappers based on exact string matching/Burrows-Wheeler-Transform (BWA, Bowtie) in sensitivity when dealing with highly polymorphic reads (or very noisy data ). E.g. a read of length 50bp with 10 mismatches will be mapped much more reliably by the former than by the latter. However, in this situation you still have 40 matches to go on, which is unlikely to arise by chance in hg (4^-40 vs 3*4^15).
In conclusion, in my opinion you don't need highly sensitive mappers (such as SHRiMP) to deal with your data.-- Matei David
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-14-2024, 07:03 AM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
05-14-2024, 07:03 AM
|
||
Started by seqadmin, 05-10-2024, 06:35 AM
|
0 responses
40 views
0 likes
|
Last Post
by seqadmin
05-10-2024, 06:35 AM
|
||
Started by seqadmin, 05-09-2024, 02:46 PM
|
0 responses
50 views
0 likes
|
Last Post
by seqadmin
05-09-2024, 02:46 PM
|
||
Started by seqadmin, 05-07-2024, 06:57 AM
|
0 responses
41 views
0 likes
|
Last Post
by seqadmin
05-07-2024, 06:57 AM
|
Comment