Yes actually the point of the study is to determine how they behave with different read lengths. Most mappers are made for say 50bp, but what about 55 or 45 or 60...? The problem is that my type of reads ( from 4C / Hi-C / 3C ) don't have nice single size reads. That's why I'm testing each aligner to see where it is the strongest and where it is the weakest. My hope is that I can combine 2 or more programs to have a complete solution.
My question for BFast is because I thought that a longer read would generate more CALS and therefore, all things being equal, would be easier to map ... which is what happens in Blat... anyway I will try to mess around with BFast a bit more
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I like the presentation (heatmap table). Could you try varying the "-K" and "-M" parameters? Alternatively, you could design indexes with greater key size (more ones in the mask). There is a lot of flexibility, though I haven't thought about longer read lengths. The only criticism I have is that short-read aligners are designed for short-reads, and you are using them in a non-standard way; just like BLAT would not be used for 35bp error-prone reads.
Leave a comment:
-
Here's a table with some comparisons I'm doing. The bmr column correlates to the number of mismatches, a read with bmr 1% in 50bp will typically have 2 mismatches.
http://www.nada.kth.se/~afer/benchmark.jpegLast edited by aleferna; 08-01-2010, 10:27 AM.
Leave a comment:
-
I'm just optimizing for specificity right now, not worrying too much on speed. I'm using the 10 indexes that you mention in the manual and no options at all. I'm comparing how different algorithms work at different read lengths / mismatches. It is similar to the study you did in the BFast paper but with 25 to 500 bp read lengths.
Leave a comment:
-
Originally posted by aleferna View PostI just finished the analysis of BFast and the results are very strange. I get really good performance at 50 and 75 bp but this degrades (significantly) with 150, 200 and 500bp reads. Is there anything that you need to adjust in bfast when you have bigger reads? In the case of blat you get better specificity and sensitivity as reads get longer, I thought BFast would out perform blat but it doesn't?
Leave a comment:
-
I just finished the analysis of BFast and the results are very strange. I get really good performance at 50 and 75 bp but this degrades (significantly) with 150, 200 and 500bp reads. Is there anything that you need to adjust in bfast when you have bigger reads? In the case of blat you get better specificity and sensitivity as reads get longer, I thought BFast would out perform blat but it doesn't?
Leave a comment:
-
Originally posted by aleferna View PostThis is very odd, I reran the localalign using -t 24 and its been running for 2 days now where as with -t 16 it only takes a few hours? Has anybody else seen this problem?
Also why does it say endReadNum: 2147483647 when there are only 3 million reads?
If not specified, the start/end read #s default to 1 and infinity (in this case (2^32)-1) respectively. Use the "-p" option to see the program parameters.
Leave a comment:
-
bfast localalign takes longer with 24 threads than with 16???
This is very odd, I reran the localalign using -t 24 and its been running for 2 days now where as with -t 16 it only takes a few hours? Has anybody else seen this problem?
Also why does it say endReadNum: 2147483647 when there are only 3 million reads?
Leave a comment:
-
Sensitivity / Specificity study
Hi Bioinfosm,
Sure I hope to have the results ready soon, I've been struggling with MAQ but I finally realize that it needs reads to be exactly the same size. Since I'm simulating the reads they usually vary 2 or 3 bases in length, that was giving me really bad maq sensitivity but now I have it working.
I will post my results, but I'm working on a very weird dataset, don't think many people has these types of problems. I'm focusing on errors due to high mutation rates not on sequencing errors. We work with cancer stem cell lines that have abnormal mutation rates and therefore the MapQ value breaks down very often. To make things worst all the reads are chimeric (its a 4C experiment) and therefore they are really tricky to map. Basically my thesis is how to combine maq, blat, bfast , bwa aln and bwa bwasw to get > 99% sensitivity with > 99.5% specificity. So far it has been impossible to achieve this level using with a single algorithm so I decided to apply each algorithm where it has the best results.
Hope I can share some of this soon
Leave a comment:
-
Originally posted by epigen View PostHi aleferna and Nils,
your thread already answered most of the questions I would also have asked Nils. But I still have two:
1. To reduce non-parallelizable I/O, would it be possible to replace the large temp files that bfast match produces by keeping the info in the memory?
2. Could I pipe the indexes from gunzip and would that make loading them faster?
And something for the wish list: Why do the bfast programs not output any information when their input comes from standard input? It would be nice to have the info in case the pipeline crashes at some point to know why.
Leave a comment:
-
I would go for SSE2 first before considering CUDA. As Nils said, it would be good for someone to take on this as a research project, but in the near future, CUDA would not deliver a performance boost significant enough to make it practically attractive and cost-effective. When you look into details, CUDA is not that decent as it looks to be. hmmerGPU, mummerGPU and swGPU are all far from the theoretical speed due to unconquerable technical difficulties.Last edited by lh3; 07-20-2010, 05:31 PM.
Leave a comment:
-
aleferna,
am interested in the "sensitivity/specificity study between aligners..." do you have any updates, resources, blog or paper to point?
thanks!
Leave a comment:
-
Hi aleferna and Nils,
your thread already answered most of the questions I would also have asked Nils. But I still have two:
1. To reduce non-parallelizable I/O, would it be possible to replace the large temp files that bfast match produces by keeping the info in the memory?
2. Could I pipe the indexes from gunzip and would that make loading them faster?
And something for the wish list: Why do the bfast programs not output any information when their input comes from standard input? It would be nice to have the info in case the pipeline crashes at some point to know why.
BFAST for CUDA sounds like a really good idea. Parallel merge sort would be great too because the merging step is the most time-consuming. Unfortunately I'm not a good programmer so I can't offer my help with opimizing the code. But I always stumble across bugs so I'd at least make a good beta tester.
I'd also like to take the opportunity to thank you all for your support!
Barbara
Leave a comment:
-
Originally posted by aleferna View PostWow, thanks for the instant reply, I love SeqAnswers where else can you talk to the man himself, cool!
2. Didn't quite understand your response regarding the sensitivity on running BFast on GPU's. I see a trend of new aligners being made to run in a computer cloud, but I think that it will take longer to upload the data to the cloud than to process it locally using GPU architecture such as the NVidia CUDA.
3. I always get 255 for the MapQ value am I doing something wrong? What is a typical value for the --avgMismatchQuality in the post process?
Will check the source code, Thanks!!
Leave a comment:
-
Wow, thanks for the instant reply, I love SeqAnswers where else can you talk to the man himself, cool!
1. The 2^N issue maybe I'm mistaken at some point I ran one of the processes with the number of threads = 24 and it started working, I came back to check on the process some ours later and it said that the number of threads must be a power of 2, I't might have been the index creation.
2. Didn't quite understand your response regarding the sensitivity on running BFast on GPU's. I see a trend of new aligners being made to run in a computer cloud, but I think that it will take longer to upload the data to the cloud than to process it locally using GPU architecture such as the NVidia CUDA.
3. I always get 255 for the MapQ value am I doing something wrong? What is a typical value for the --avgMismatchQuality in the post process?
Will check the source code, Thanks!!
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
62 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Leave a comment: