Nop, I calculate score as matches - mismatches - gapbases
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
also if I can venture a suggestion, when I implemented the blat mapq value using S1 - S2, I noticed a big effect on the MinMapQ the MapQ threshold needed to achieve 99% specificity, as you have longer reads you need a lower thresholds, but also as your error rate decreases you need a lower threshold as well. I've been wondering that instead of using the S1 to normalize the value, if you could normalize by some sort of combination between the error rate and the read length. I think you can better approximate MapQ by combining these 2 components rather than trying to summarize them in S1.Last edited by aleferna; 08-18-2010, 07:20 AM.
Comment
-
Here's the behavior of a simple Blat MapQ valueAttached Files
Comment
-
I gradually recall the decision on choosing the parameters for blat. My focus was more on >=500bp reads. And for these reads, blat -fastMap is similar to blat deault in accuracy but tens of times faster. However, for shorter reads which you are focusing on, blat default is much more accurate than blat -fastMap (still much slower, though). Your table would largely agree with mine for blat default.
Actually for 454, I would highly recommend ssaha2. Ssaha2 is designed for mapping sequencing data and calling SNPs from the first day and has been thoroughly validated. Blat, although being one of the best tools for mapping ESTs, is not for SNP finding initially and is not heavily evaluated. From what I have heard, blat does not refine the final alignment, which may make gaps positioned suboptimally and pose problems to indel finding. The default blat mode is also much slower and less accurate than ssaha2. In my view, it is a common mistake to overlook the superiority of ssaha2 for longer reads. The 1000 genomes project chooses every program for a reason.Last edited by lh3; 08-18-2010, 07:58 AM.
Comment
-
@Heng Li
Well I don't care too much about SNPs, actually what I work with resembles more chip-seq technology. All I need to know is the position, not the alignment. I like BWA because I need to work with both 454 and HiSeq, and compare them, so I prefer BWA because seems to be able to manage both. Does Ssaha2 manage high throughput?
Comment
-
Originally posted by aleferna View Post@Heng Li
Well I don't care too much about SNPs, actually what I work with resembles more chip-seq technology. All I need to know is the position, not the alignment. I like BWA because I need to work with both 454 and HiSeq, and compare them, so I prefer BWA because seems to be able to manage both. Does Ssaha2 manage high throughput?
Comment
-
Originally posted by query View PostWhat is the best tool available to map 454 reads to a reference genome? What is the method used by gs reference Mapper (analysis tool that comes with 454) and does it do a decent job of mapping and identifying variants?
Comment
-
@Adamo
Here is the script that I've been using. DISCLAIMER: I made this for my own data and it has not been tested on regular sequence data, so please read the code make sure you understand what the script does before using it. It is tuned to join BWASW Z 100 with ALN N 4 sam files.
Also, its a python script but the system wouldn't upload it with extention .py.Attached Files
Comment
-
Originally posted by aleferna View Post@Adamo
Here is the script that I've been using. DISCLAIMER: I made this for my own data and it has not been tested on regular sequence data, so please read the code make sure you understand what the script does before using it. It is tuned to join BWASW Z 100 with ALN N 4 sam files.
Also, its a python script but the system wouldn't upload it with extention .py.
Thanks a lot, I'm gonna see what's in it now.
Comment
-
Instead of using Z=100 on the whole data set, it might be a better (meaning faster) idea to first align the data set with Z=1 (default value) and then realign the ones that do not satisfy your alignment criteria with a higher value for Z. This should speed up the process if you assume that a high number of the reads will map to the reference.
Comment
-
Originally posted by aleferna View PostThe first time I ran BWA with the long aligner I didn't realize that there was a short/long option and since I have both in my library I was very disappointed of BWA. I started testing algorithm after algorithm and finally reviewed BWA again. This time I made a small script that will just join 2 sam files, one for the small aligner and one from the long aligner. It will choose the alignment from the short aligner if it cannot find it in the long aligner, this was the winning combination.
I've mentioned this chart in another thread, but here you can see that BWA is the only one that can cover the full range of read sizes in 454 datasets (or in 100bp solexa data after you remove the pair end adapters!)
Moreover, I know using the Z=100 seems a bit of an overkill but with 454 data and a decent computer BWA will take just a few minutes and I did measure Z=1,10,25,50,100,250 and even 500. Z = 100 seems to be the peak, after this I cannot squeeze any specificity out of the algorithm, but you do see a change from Z=10 to Z=100.
Comment
-
Originally posted by lh3 View PostSsaha2 is designed for high throughput sequencing. As I said, it is usually faster than blat, although less easy to use, I would say.
My RNA-seq data is not for a species that genome is sequenced but zebrafish genome maybe suitable for these sample are fishes which are close relative of zebrafish. The goal is to analysis SNP and recombination in hybirds and their parents. Is there any guys have idea?
Really appreciate for you guys!Last edited by boyzoe; 08-23-2010, 07:35 AM.
Comment
-
Originally posted by boyzoe View PostActually, I couldn't install in ubuntu. After extraction, I could see the files (read me, ssaha2, ssaha2build, ssaha snp). However, after put the command into terminal, it told me that command can't found. This bothers me for a week.
./ssaha
(assuming the file is in the current directory, indicated by the dot in Unix). If you tried this:
ssaha
it would look for an installed copy of ssaha on the system path - but it would not try the current directory. At least, that is how recent versions of Ubuntu are configured.
Comment
-
Originally posted by robs View PostLooking at your chart, you actually get better sensitivity for longer reads with low error rates using the default settings instead of using Z=100. Any idea what causes a higher Z-best value to result in lower sensitivity?
you mean like 200bp 0% error? where Z100 is 97.29% and default is 97.30%??
Comment
Latest Articles
Collapse
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
-
by seqadmin
The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...-
Channel: Articles
08-27-2024, 04:44 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 06:25 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
Today, 06:25 AM
|
||
Started by seqadmin, Yesterday, 01:02 PM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 01:02 PM
|
||
Started by seqadmin, 09-18-2024, 06:39 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-18-2024, 06:39 AM
|
||
Started by seqadmin, 09-11-2024, 02:44 PM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-11-2024, 02:44 PM
|
Comment