Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Roche's gsMapper

    Hello

    Has anyone here ever changed the parameters used by gsMapper when mapping their read data to a reference genome? If so, can anyone elaborate on what "minimum overlap length" and "alignment identity score" means? (the definition in the manual is far too little)

    Cheers

    Layla

  • #2
    I have not modified the default setting in gsMapper running.

    gsMapper algorithm is similar to other assembly software (phrap), using the similar concept of "overlap" between reads to obtain contigs.

    The difference is that 454 gsMapper is all based on raw flow space. Therefore, the scores, the length I believe is on flow space.

    For example, minimum overlap length, default value is 40 based on Manual. I believe 40 means 40 flows, not 40 bases. 40 flows is roughly between 16bp to 20 bp.

    You can play with the value, but I doubt that you can get any real difference in result.

    Comment


    • #3
      I don't think this is true. I think It's 40 bases not 40 flows. IIRC (not that it's in the manual), flowspace is only used in calling the consensus *after* mapping the reads (in sequence space).

      I could be wrong. It's a shame its not easy to find these things out.

      Also, I think these settings should have a big effect on the result. 'Seed size' is a trade off between sensitivity and running time. The bigger the seed size, the quicker the running time, but the more 'nearly perfect' hits you will miss. The lower the seed size, the higher the sensitivity, but the specificity dramatically reduces at some point, so many false matches need to be inspected at later stages of the mapping.
      Last edited by dan; 09-13-2009, 11:44 PM. Reason: Responding to the second point too.
      Homepage: Dan Bolser
      MetaBase the database of biological databases.

      Comment


      • #4
        We used some different values for "minimum length" and "minimum identity": -ml 90% -mi 96% to get more reliable variation detection in areas with lower coverage.

        Comment


        • #5
          Maybe silly but I simply did a BLAT analysis of the reads (which is really fast) to a reference genome which allowed me to simply choose any cut-off I like (length as well as sensitivity %homology). But probably this also depends on the specific requirements.....
          My 2 cents.
          Alex

          Comment


          • #6
            Originally posted by AlexB View Post
            Maybe silly but I simply did a BLAT analysis of the reads (which is really fast) to a reference genome which allowed me to simply choose any cut-off I like (length as well as sensitivity %homology). But probably this also depends on the specific requirements.....
            My 2 cents.
            Alex
            Alex, with the homopolymer issue, do you have something standard to take care of all those small indels that blat might be returning? I believe gsMapper has some in-built filters to take care of some of those false positives..
            --
            bioinfosm

            Comment


            • #7
              I have to admit that in such detail we never looked so I can't comment. Since we were relatively new to the technology at the time we compared the results of gsmapper to the ones returned by BLAT and using certain homology/length cutoffs we more or less reproduced the results. This was using a 2Mb genome though... Can you be more precise with what you exactly mean I will keep my eye on it.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-27-2024, 06:37 PM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-27-2024, 06:07 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X