Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dan
    replied
    Sorry, I'm still confused about these parameters.

    Here is some information I got by asking the same question to Dr. Michael Stiens, Manager Customer Support Genome Sequencing, Roche Diagnostics GmbH...

    Seed Step: It is the number of bases after which the next seed begins on the same read. Each seed is 16 bp in lenght (default) and the seed step is 12bp. So there is an overlap of 4bp between every seed on a read.

    One question would be... Does the seed step parameter define an upper or a lower limit? While I found dePhi's answer to be very interesting (I never thought about the different mapping qualities w.r.t. seed length before) I don't see how it relates to the parameters used. i.e. they talk about a "distribution of seed step" ... so is the parameter the upper limit of that distribution?

    Cheers,

    Leave a comment:


  • drgoettel
    replied
    Very useful.
    Thankyou!

    Leave a comment:


  • dePhi
    replied
    I'm not an expert on assembly, but i'll try to help.

    When doing a overlap analysis you want to know some parameters about how good your overlap is. Is it nice and uniform, or does it have parts only represented by 2 or 3 seeds and parts covered by 100 seeds. But that's only coverage, a bit to much of a simplification of the assembly quality. 30 time coverage with 500-mers is not the same as 30 time coverage with 8-mers. Which is where these 2 parameters come in.

    Seed step is the distance between the start of one overlapping segment with the next. Say you find sequence #1 (a 12-mer for example) starting at base number 1 and you find sequence #2 (also a 12-mer) starting at base number 6, then your seed step would be 5. The distribution of seed step gives you a idea of how uniformly that part of you assembly is represented by actual reads. Ideally you would have a new read start at each new base for the best alignment quality.

    Seed length is the k-mer length you are using. If your assembly would consist of uniform reads, all of the same length, your seed length wouldn't vary across your assembly. But reporting the seed length gives you an idea of the quality of the reads used in that part of your assembly. For instance, if part of your assembly is made up of seeds which are way smaller then a part of your assembly which is just as well covered but by seeds with a much greater length, you can say that the quality of your assembly is better at the site with larger seed length. That's because the quality of your reads is usually better in longer reads, or else they would have been trimmed.

    But the power of these parameters I think is in there combination. Having large seed steps is okay as long as your k-mer length is also large. If your k-mers are small you want small seed steps, or otherwise the total alignment quality is lower.

    I hope my rambling was useful.
    Cheers

    Leave a comment:


  • drgoettel
    started a topic GS FLX data analysis software manual

    GS FLX data analysis software manual

    Hello,

    could anybody explain with a little more detail the next overlap detection parameters available in the GS de novo assembler Application gsAssembler??

    Seed step – The number of bases between seed generation locations used in the exact k-mer matching part of the overlap detection
    Seed length – The number of bases used for each seed in the exact k-mer matching part of the overlap detection (i.e. the “k” value of the k-mer matching)

    Thankyou very much!!

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:53 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
34 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 09:45 AM
0 responses
204 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 08:54 AM
0 responses
213 views
0 likes
Last Post seqadmin  
Working...
X