Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • aleferna
    replied
    Yes actually the point of the study is to determine how they behave with different read lengths. Most mappers are made for say 50bp, but what about 55 or 45 or 60...? The problem is that my type of reads ( from 4C / Hi-C / 3C ) don't have nice single size reads. That's why I'm testing each aligner to see where it is the strongest and where it is the weakest. My hope is that I can combine 2 or more programs to have a complete solution.
    My question for BFast is because I thought that a longer read would generate more CALS and therefore, all things being equal, would be easier to map ... which is what happens in Blat... anyway I will try to mess around with BFast a bit more
    Last edited by aleferna; 08-01-2010, 10:47 AM.

    Leave a comment:


  • nilshomer
    replied
    I like the presentation (heatmap table). Could you try varying the "-K" and "-M" parameters? Alternatively, you could design indexes with greater key size (more ones in the mask). There is a lot of flexibility, though I haven't thought about longer read lengths. The only criticism I have is that short-read aligners are designed for short-reads, and you are using them in a non-standard way; just like BLAT would not be used for 35bp error-prone reads.

    Leave a comment:


  • aleferna
    replied
    Here's a table with some comparisons I'm doing. The bmr column correlates to the number of mismatches, a read with bmr 1% in 50bp will typically have 2 mismatches.

    http://www.nada.kth.se/~afer/benchmark.jpeg
    Last edited by aleferna; 08-01-2010, 10:27 AM.

    Leave a comment:


  • aleferna
    replied
    I'm just optimizing for specificity right now, not worrying too much on speed. I'm using the 10 indexes that you mention in the manual and no options at all. I'm comparing how different algorithms work at different read lengths / mismatches. It is similar to the study you did in the BFast paper but with 25 to 500 bp read lengths.

    Leave a comment:


  • nilshomer
    replied
    Originally posted by aleferna View Post
    I just finished the analysis of BFast and the results are very strange. I get really good performance at 50 and 75 bp but this degrades (significantly) with 150, 200 and 500bp reads. Is there anything that you need to adjust in bfast when you have bigger reads? In the case of blat you get better specificity and sensitivity as reads get longer, I thought BFast would out perform blat but it doesn't?
    What performance metrics are you using (running time, accuracy, sensitivity)? I haven't tried BFAST with longer reads (>200bp), so there would need to be some thought on how to make it work for long reads. Remember there are short-read and long-read aligners. Have you tried the BWA-SW module? It performs very well for longer reads.

    Leave a comment:


  • aleferna
    replied
    I just finished the analysis of BFast and the results are very strange. I get really good performance at 50 and 75 bp but this degrades (significantly) with 150, 200 and 500bp reads. Is there anything that you need to adjust in bfast when you have bigger reads? In the case of blat you get better specificity and sensitivity as reads get longer, I thought BFast would out perform blat but it doesn't?

    Leave a comment:


  • nilshomer
    replied
    Originally posted by aleferna View Post
    This is very odd, I reran the localalign using -t 24 and its been running for 2 days now where as with -t 16 it only takes a few hours? Has anybody else seen this problem?

    Also why does it say endReadNum: 2147483647 when there are only 3 million reads?
    The threading option is "-n", not "-t". Threading is not perfectly scalable, and can be a result from many factors (take an OS & architecture course for an introduction).

    If not specified, the start/end read #s default to 1 and infinity (in this case (2^32)-1) respectively. Use the "-p" option to see the program parameters.

    Leave a comment:


  • aleferna
    replied
    bfast localalign takes longer with 24 threads than with 16???

    This is very odd, I reran the localalign using -t 24 and its been running for 2 days now where as with -t 16 it only takes a few hours? Has anybody else seen this problem?

    Also why does it say endReadNum: 2147483647 when there are only 3 million reads?

    Leave a comment:


  • aleferna
    replied
    Sensitivity / Specificity study

    Hi Bioinfosm,

    Sure I hope to have the results ready soon, I've been struggling with MAQ but I finally realize that it needs reads to be exactly the same size. Since I'm simulating the reads they usually vary 2 or 3 bases in length, that was giving me really bad maq sensitivity but now I have it working.

    I will post my results, but I'm working on a very weird dataset, don't think many people has these types of problems. I'm focusing on errors due to high mutation rates not on sequencing errors. We work with cancer stem cell lines that have abnormal mutation rates and therefore the MapQ value breaks down very often. To make things worst all the reads are chimeric (its a 4C experiment) and therefore they are really tricky to map. Basically my thesis is how to combine maq, blat, bfast , bwa aln and bwa bwasw to get > 99% sensitivity with > 99.5% specificity. So far it has been impossible to achieve this level using with a single algorithm so I decided to apply each algorithm where it has the best results.

    Hope I can share some of this soon

    Leave a comment:


  • nilshomer
    replied
    Originally posted by epigen View Post
    Hi aleferna and Nils,

    your thread already answered most of the questions I would also have asked Nils. But I still have two:
    1. To reduce non-parallelizable I/O, would it be possible to replace the large temp files that bfast match produces by keeping the info in the memory?
    Yes, if enough memory is available. Storing on disk is a function of not having enough RAM (1TB should solve a lot of this ).

    2. Could I pipe the indexes from gunzip and would that make loading them faster?
    Probably not, since the underlying system calls are using zlib (gzip). My suggestion would be to get a faster disk.

    And something for the wish list: Why do the bfast programs not output any information when their input comes from standard input? It would be nice to have the info in case the pipeline crashes at some point to know why.
    They do! Each command initially prints its program parameters! See the "readsFileName:" line in "bfast match" for example. It will name the file or STDIN.

    Leave a comment:


  • lh3
    replied
    I would go for SSE2 first before considering CUDA. As Nils said, it would be good for someone to take on this as a research project, but in the near future, CUDA would not deliver a performance boost significant enough to make it practically attractive and cost-effective. When you look into details, CUDA is not that decent as it looks to be. hmmerGPU, mummerGPU and swGPU are all far from the theoretical speed due to unconquerable technical difficulties.
    Last edited by lh3; 07-20-2010, 05:31 PM.

    Leave a comment:


  • bioinfosm
    replied
    aleferna,

    am interested in the "sensitivity/specificity study between aligners..." do you have any updates, resources, blog or paper to point?

    thanks!

    Leave a comment:


  • epigen
    replied
    Hi aleferna and Nils,

    your thread already answered most of the questions I would also have asked Nils. But I still have two:
    1. To reduce non-parallelizable I/O, would it be possible to replace the large temp files that bfast match produces by keeping the info in the memory?
    2. Could I pipe the indexes from gunzip and would that make loading them faster?

    And something for the wish list: Why do the bfast programs not output any information when their input comes from standard input? It would be nice to have the info in case the pipeline crashes at some point to know why.

    BFAST for CUDA sounds like a really good idea. Parallel merge sort would be great too because the merging step is the most time-consuming. Unfortunately I'm not a good programmer so I can't offer my help with opimizing the code. But I always stumble across bugs so I'd at least make a good beta tester.

    I'd also like to take the opportunity to thank you all for your support!

    Barbara

    Leave a comment:


  • nilshomer
    replied
    Originally posted by aleferna View Post
    Wow, thanks for the instant reply, I love SeqAnswers where else can you talk to the man himself, cool!
    Without users, a developer is nothing.

    2. Didn't quite understand your response regarding the sensitivity on running BFast on GPU's. I see a trend of new aligners being made to run in a computer cloud, but I think that it will take longer to upload the data to the cloud than to process it locally using GPU architecture such as the NVidia CUDA.
    Implementation is important, and the GPU vs. cloud vs. FPGA or a solution customized by the problem are all important things to consider. I don't weight in on this topic for good reason: I need more data to make an opinion.

    3. I always get 255 for the MapQ value am I doing something wrong? What is a typical value for the --avgMismatchQuality in the post process?

    Will check the source code, Thanks!!
    A 255 is returned if there is no second best hit, which happens when the read is uniquely mapped. See
    Download Blat-like Fast Accurate Search Tool for free. BFAST facilitates the fast and accurate mapping of short reads to reference sequences, where mapping billions of short reads with variants is of utmost importance.

    Leave a comment:


  • aleferna
    replied
    Wow, thanks for the instant reply, I love SeqAnswers where else can you talk to the man himself, cool!

    1. The 2^N issue maybe I'm mistaken at some point I ran one of the processes with the number of threads = 24 and it started working, I came back to check on the process some ours later and it said that the number of threads must be a power of 2, I't might have been the index creation.

    2. Didn't quite understand your response regarding the sensitivity on running BFast on GPU's. I see a trend of new aligners being made to run in a computer cloud, but I think that it will take longer to upload the data to the cloud than to process it locally using GPU architecture such as the NVidia CUDA.

    3. I always get 255 for the MapQ value am I doing something wrong? What is a typical value for the --avgMismatchQuality in the post process?

    Will check the source code, Thanks!!

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-25-2024, 11:49 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Working...
X