Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST using GPUs

    Dear Sir,

    We are Computer Engineering Students. We have read the BFAST paper
    Background The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation. Methodology We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. Conclusions We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.


    We are trying to implement it using GPUs.(CUDA) But we are facing some trouble with the indexing bit.
    Could you explain or provide sources for more information on how BFAST creates a reference genome and how indexing is done?

  • #2
    Originally posted by nikhil.stephen View Post
    Dear Sir,

    We are Computer Engineering Students. We have read the BFAST paper
    Background The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation. Methodology We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. Conclusions We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.


    We are trying to implement it using GPUs.(CUDA) But we are facing some trouble with the indexing bit.
    Could you explain or provide sources for more information on how BFAST creates a reference genome and how indexing is done?
    The best place to look would be the source code found at http://bfast.sourceforge.net, especially "RGIndex.{c,h}" and "RGBinary.{c,h}".

    Comment


    • #3
      this will be interesting. good luck!

      Comment


      • #4
        difficulty in understanding code

        @nilshomer
        we are 4th year students doing our final year project and we tried understanding the code given in "RGIndex.{c,h}" and "RGBinary.{c,h}". but its almost incomprehensible for us
        Could you email us the algorithm in detail so that we can try coding it on our own? We're confident about coding since we have a fully functional GPU parallelized Smith-Waterman code, written by us

        sorry for the trouble.. thank you for ur time

        Comment


        • #5
          Originally posted by nikhil.stephen View Post
          @nilshomer
          we are 4th year students doing our final year project and we tried understanding the code given in "RGIndex.{c,h}" and "RGBinary.{c,h}". but its almost incomprehensible for us
          Could you email us the algorithm in detail so that we can try coding it on our own? We're confident about coding since we have a fully functional GPU parallelized Smith-Waterman code, written by us

          sorry for the trouble.. thank you for ur time
          This will be beyond my ability to help,

          Nils

          Comment


          • #6
            Originally posted by nikhil.stephen View Post
            Dear Sir,

            We are Computer Engineering Students. We have read the BFAST paper
            Background The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation. Methodology We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. Conclusions We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.


            We are trying to implement it using GPUs.(CUDA) But we are facing some trouble with the indexing bit.
            Could you explain or provide sources for more information on how BFAST creates a reference genome and how indexing is done?
            Hi, Interesting work. I am working in a R&D lab on various High Performance Computing applications. Would like to see if we can collaborate on this effort. Please contact me if you are interested ([email protected]).

            Comment


            • #7
              Why not try to make the core parts vectorized (i.e the Smith Waterman)? The extra money spent on GPUs could be used to buy multi-core processors (not to mention saving rack space). The vectorized implementation would have a greater impact on users than GPUs. I think SHRiMP has vectorized code embedded. This is on my wishlist for BFAST ahead of GPU support.

              My 2 cents.

              Comment


              • #8
                Originally posted by nilshomer View Post
                Why not try to make the core parts vectorized (i.e the Smith Waterman)? The extra money spent on GPUs could be used to buy multi-core processors (not to mention saving rack space). The vectorized implementation would have a greater impact on users than GPUs. I think SHRiMP has vectorized code embedded. This is on my wishlist for BFAST ahead of GPU support.

                My 2 cents.
                Makes sense. I am looking at using OpenCL rather than CUDA, hence still allowing it to take the path you have mentioned.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  05-06-2024, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 05-14-2024, 07:03 AM
                0 responses
                23 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-10-2024, 06:35 AM
                0 responses
                44 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-09-2024, 02:46 PM
                0 responses
                58 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-07-2024, 06:57 AM
                0 responses
                44 views
                0 likes
                Last Post seqadmin  
                Working...
                X