Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Contrail - a hadoop-based de novo sequence assembler

    Hello all,

    Most bioinformatics researchers get stuck with the question of how to buy a computer with enough RAM to process their NGS data, because RAM is very expensive. It is not easy to get approval for buying a $100K computer from the managers, when they think everything can be done using $1K laptop and Microsoft excel.

    Internet companies like Google developed algorithms to process terabytes and petabytes of data very rapidly and give users the search results. They use clusters of commodity computers with inexpensive disks (hard drive is cheap) using an approach called MapReduce. MapReduce framework is available for free under Hadoop framework distributed by Apache foundation.

    Few months back, I came across a genome assembly program called 'contrail' that uses Hadoop to assemble large quantities of NGS data, and it is scalable. When I speak to bioinformaticians about trying out Hadoop instead of buying large and expensive RAM-based machine, I usually hit a hard wall, because the words like Hadoop, MapReduce etc. are foreign to them. So, today I wrote a post to explain setting up and run contrail on your own machine using Hadoop. It is written in such a way that even if you never used Hadoop etc., you can mechanically execute the steps and will be able to assemble the reads in test library in a short time in your own Windows or Unix box. I am hoping that once researchers start to feel that Hadoop approach is easy and scalable for large data sets, they will be able to develop their own programs and the whole community will benefit.

    This post discusses how to use contrail assembler -



    This post discusses how to set up and run Hadoop for a simple sequence analysis example -




    Please note that I am not associated with the researchers, who wrote contrail, and never spoke to them or met them. It is the only example I found for de Bruijn assemblers and decided to try it out.
    http://homolog.us

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 07:24 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-13-2024, 08:58 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-12-2024, 02:20 PM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-07-2024, 06:58 AM
0 responses
184 views
0 likes
Last Post seqadmin  
Working...
X