Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GS FLX data analysis software manual

    Hello,

    could anybody explain with a little more detail the next overlap detection parameters available in the GS de novo assembler Application gsAssembler??

    Seed step – The number of bases between seed generation locations used in the exact k-mer matching part of the overlap detection
    Seed length – The number of bases used for each seed in the exact k-mer matching part of the overlap detection (i.e. the “k” value of the k-mer matching)

    Thankyou very much!!

  • #2
    I'm not an expert on assembly, but i'll try to help.

    When doing a overlap analysis you want to know some parameters about how good your overlap is. Is it nice and uniform, or does it have parts only represented by 2 or 3 seeds and parts covered by 100 seeds. But that's only coverage, a bit to much of a simplification of the assembly quality. 30 time coverage with 500-mers is not the same as 30 time coverage with 8-mers. Which is where these 2 parameters come in.

    Seed step is the distance between the start of one overlapping segment with the next. Say you find sequence #1 (a 12-mer for example) starting at base number 1 and you find sequence #2 (also a 12-mer) starting at base number 6, then your seed step would be 5. The distribution of seed step gives you a idea of how uniformly that part of you assembly is represented by actual reads. Ideally you would have a new read start at each new base for the best alignment quality.

    Seed length is the k-mer length you are using. If your assembly would consist of uniform reads, all of the same length, your seed length wouldn't vary across your assembly. But reporting the seed length gives you an idea of the quality of the reads used in that part of your assembly. For instance, if part of your assembly is made up of seeds which are way smaller then a part of your assembly which is just as well covered but by seeds with a much greater length, you can say that the quality of your assembly is better at the site with larger seed length. That's because the quality of your reads is usually better in longer reads, or else they would have been trimmed.

    But the power of these parameters I think is in there combination. Having large seed steps is okay as long as your k-mer length is also large. If your k-mers are small you want small seed steps, or otherwise the total alignment quality is lower.

    I hope my rambling was useful.
    Cheers

    Comment


    • #3
      Very useful.
      Thankyou!

      Comment


      • #4
        Sorry, I'm still confused about these parameters.

        Here is some information I got by asking the same question to Dr. Michael Stiens, Manager Customer Support Genome Sequencing, Roche Diagnostics GmbH...

        Seed Step: It is the number of bases after which the next seed begins on the same read. Each seed is 16 bp in lenght (default) and the seed step is 12bp. So there is an overlap of 4bp between every seed on a read.

        One question would be... Does the seed step parameter define an upper or a lower limit? While I found dePhi's answer to be very interesting (I never thought about the different mapping qualities w.r.t. seed length before) I don't see how it relates to the parameters used. i.e. they talk about a "distribution of seed step" ... so is the parameter the upper limit of that distribution?

        Cheers,
        Homepage: Dan Bolser
        MetaBase the database of biological databases.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        68 views
        0 likes
        Last Post seqadmin  
        Working...
        X