Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST using GPUs

    Dear Sir,

    We are Computer Engineering Students. We have read the BFAST paper
    Background The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation. Methodology We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. Conclusions We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.


    We are trying to implement it using GPUs.(CUDA) But we are facing some trouble with the indexing bit.
    Could you explain or provide sources for more information on how BFAST creates a reference genome and how indexing is done?

  • #2
    Originally posted by nikhil.stephen View Post
    Dear Sir,

    We are Computer Engineering Students. We have read the BFAST paper
    Background The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation. Methodology We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. Conclusions We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.


    We are trying to implement it using GPUs.(CUDA) But we are facing some trouble with the indexing bit.
    Could you explain or provide sources for more information on how BFAST creates a reference genome and how indexing is done?
    The best place to look would be the source code found at http://bfast.sourceforge.net, especially "RGIndex.{c,h}" and "RGBinary.{c,h}".

    Comment


    • #3
      this will be interesting. good luck!

      Comment


      • #4
        difficulty in understanding code

        @nilshomer
        we are 4th year students doing our final year project and we tried understanding the code given in "RGIndex.{c,h}" and "RGBinary.{c,h}". but its almost incomprehensible for us
        Could you email us the algorithm in detail so that we can try coding it on our own? We're confident about coding since we have a fully functional GPU parallelized Smith-Waterman code, written by us

        sorry for the trouble.. thank you for ur time

        Comment


        • #5
          Originally posted by nikhil.stephen View Post
          @nilshomer
          we are 4th year students doing our final year project and we tried understanding the code given in "RGIndex.{c,h}" and "RGBinary.{c,h}". but its almost incomprehensible for us
          Could you email us the algorithm in detail so that we can try coding it on our own? We're confident about coding since we have a fully functional GPU parallelized Smith-Waterman code, written by us

          sorry for the trouble.. thank you for ur time
          This will be beyond my ability to help,

          Nils

          Comment


          • #6
            Originally posted by nikhil.stephen View Post
            Dear Sir,

            We are Computer Engineering Students. We have read the BFAST paper
            Background The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation. Methodology We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. Conclusions We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.


            We are trying to implement it using GPUs.(CUDA) But we are facing some trouble with the indexing bit.
            Could you explain or provide sources for more information on how BFAST creates a reference genome and how indexing is done?
            Hi, Interesting work. I am working in a R&D lab on various High Performance Computing applications. Would like to see if we can collaborate on this effort. Please contact me if you are interested ([email protected]).

            Comment


            • #7
              Why not try to make the core parts vectorized (i.e the Smith Waterman)? The extra money spent on GPUs could be used to buy multi-core processors (not to mention saving rack space). The vectorized implementation would have a greater impact on users than GPUs. I think SHRiMP has vectorized code embedded. This is on my wishlist for BFAST ahead of GPU support.

              My 2 cents.

              Comment


              • #8
                Originally posted by nilshomer View Post
                Why not try to make the core parts vectorized (i.e the Smith Waterman)? The extra money spent on GPUs could be used to buy multi-core processors (not to mention saving rack space). The vectorized implementation would have a greater impact on users than GPUs. I think SHRiMP has vectorized code embedded. This is on my wishlist for BFAST ahead of GPU support.

                My 2 cents.
                Makes sense. I am looking at using OpenCL rather than CUDA, hence still allowing it to take the path you have mentioned.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin







                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has...
                  Yesterday, 01:49 PM
                • seqadmin
                  Genetic Variation in Immunogenetics and Antibody Diversity
                  by seqadmin



                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                  11-06-2024, 07:24 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 09:29 AM
                0 responses
                14 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 09:06 AM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 08:03 AM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-22-2024, 07:36 AM
                0 responses
                65 views
                0 likes
                Last Post seqadmin  
                Working...
                X