Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Rupinder
    Junior Member
    • May 2009
    • 1

    Very Short Read aligner

    Hi All,

    I am in process of building my own short read aligner for a lab working with cancer-genome sequencing. And have following questions:

    1. As part for a our problem - we have with us read length ranging from 18-22 bp.

    I am aware of several aligners currently available, and I was wondering if they were any issues at all, working with read length of above range. From all the research papers of aligners that I have gone through; usually the range has varied from 30 bp and upwards.

    It would be really helpful if other members , who might have worked on similar read length, could advice me.

    2. Also, are there any species-specific statistical heuristics that are known when employing an approach for short read alignment. To be more specific for example lets say for non-mammalian sequence source , I would like to set different set of parameters when doing a short read alignment as compared to mammalian source.

    3. As I mentioned, I am in process of developing my own aligner, any piece of advise or suggestions would be very valuable from members who have embarked on the same . I am still in brain-storming phase, and trying the scope my problem range that this short read aligner would address. For now formally this aligner should be able to:
    a. align read's length ranging 18-22 bp to reference genome
    b. ungapped alignment
    c. Applies a species - specific statistical scoring heuristic (If it all it makes sense to use one in first place.)

    I am looking forward to any advice or suggestion or perhaps even a general discussion on process of developing a short read aligner from scratch.

    thank you

    regards
    Rupinder
  • Torst
    Senior Member
    • Apr 2008
    • 275

    #2
    Rupinder,

    Originally posted by Rupinder View Post
    I am looking forward to any advice or suggestion or perhaps even a general discussion on process of developing a short read aligner from scratch.
    My advice is not to write (yet another) short read aligner, unless you are doing it to "learn" or as part of a student project.

    There are so many already: MAQ, Shrimp, SOAP, Bowtie, ELAND, Novocraft etc. They are all doing essentially the same thing, and most of them are very efficient indeed, and you are unlikely to better them. I'm sure you could find the appropriate parameters to suit your data, and 20-24 bp reads are not a problem. The reason most of them are 30+ bp is that most Illumina data is around that length, but SOLID3 data is often 24 bp and the aligners work well. Shrimp and others have post-processing scripts to help correct for bias with sequences with non-uniform statistics.

    I think you would be better off using the existing tools with appropriate settings, rather than write your own, and get on with the science further down the line.

    (But if you have the time, and want to learn more about alignment, optimization, programming etc, go for it! Especially if you want to write a GPU enabled version)

    --Torsten

    Comment

    Latest Articles

    Collapse

    • SEQadmin2
      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
      by SEQadmin2


      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
      ...
      06-02-2026, 10:05 AM
    • SEQadmin2
      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
      by SEQadmin2


      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


      Introduction

      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
      05-22-2026, 06:42 AM
    • SEQadmin2
      Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
      by SEQadmin2

      Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


      Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
      05-06-2026, 09:04 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, Yesterday, 08:59 AM
    0 responses
    14 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 12:03 PM
    0 responses
    22 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 11:40 AM
    0 responses
    19 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 05-28-2026, 11:40 AM
    0 responses
    32 views
    0 reactions
    Last Post SEQadmin2  
    Working...