Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Setting the parameter “map_len” in SOAPdenovo assembly software

    Hi,

    I'm now using SOAPdenovo to assemble a big genome(Illumina GAIIx data;avg insert 200bp;pair-end reads 150bp x 2 ).

    From the software manual, I found there is a parameter called "map_len".

    In the example from the software manual, the default map_len=32, when maximal read length = 50 and average insert size = 200). In my case, my PE read length = 150, so, should I increase the map_len value?

    Does anyone know how to set up a suitable map_len value?

    PS: An example.config from the software manual:
    "
    #maximal read length
    max_rd_len=50
    [LIB]
    #average insert size
    avg_ins=200
    ...
    ...
    #minimum aligned length to contigs for a reliable read location (default 32)
    map_len=32
    ...

    [LIB]
    avg_ins=2000
    ...
    ...
    #minimum aligned length to contigs for a reliable read location
    #(default 35 for large insert size)
    map_len=35
    ... "

  • #2
    Originally posted by Godevil View Post
    In the example from the software manual, the default map_len=32, when maximal read length = 50 and average insert size = 200). In my case, my PE read length = 150, so, should I increase the map_len value?
    For good quality PE libraries, you can probably go a bit bigger - it should help find uniqueness without losing too many reads due to error. The 'map' step will tell you how many reads aligned, so you can play about to maximise that. Then again, link counts are usually not a problem in PE libraries.

    For MP libraries, you probably don't want to go too big, because you lose reads once you pass the splicing point, which is more or less randomly located.

    BTW, do you really have paired 150bp reads of a 200bp fragment? If so, you're likely to have a lot of adapter in there. You might also want to consider 'pre-flattening' the read pairs into a single longer read, and assembling as SE reads.

    Comment


    • #3
      Originally posted by tonybolger View Post
      BTW, do you really have paired 150bp reads of a 200bp fragment? If so, you're likely to have a lot of adapter in there.
      Thank you very much! I really found about 10% of my reads contain adapter sequences at their 3' ends. I try to use cutadapt software to trim those adapter sequences.

      But, I don't know how to detect and trim adapter/primer-dimer in my reads. Could you give me some advice about this? Can cutadapt do this work?

      Comment


      • #4
        The right way to set up the “map_len” value in SOAPdenovo software!

        There is an official answer came from a technician in Beijing Genomics Institute (BGI).

        I want to share it with everyone here.

        "Just leave the option "map_len" alone when you are doing initial assembly.
        After the success of initial assembly, try increase "map_len" to gain a better scaffold result (or, sometime, worse) provided that, 1. The "map_len" option will not effect on libraries that reads are long than 100, 2. It's not wise to set over 50.
        3. Increase 1 by 1, optimal results usually gain when increase by 2 or 3, don't increase to much, especially for genomes with higher heterozygosity.
        "

        Comment


        • #5
          Originally posted by Godevil View Post
          Thank you very much! I really found about 10% of my reads contain adapter sequences at their 3' ends. I try to use cutadapt software to trim those adapter sequences.

          But, I don't know how to detect and trim adapter/primer-dimer in my reads. Could you give me some advice about this? Can cutadapt do this work?
          In my experience, you get two problems with adapters. Either the start of the read is adapter followed by junk (i guess this is caused by two adapters sticking together), or you get a short correct fragment but the end of the read is the 'other' adapter reverse-complemented. Given your read length and fragment size, i'd expect the latter to be common - 10% isn't actually bad.

          I've developed my own tool to 'pre-process' reads, trimming adapters, and using various criteria to filter by quality, and handles reads becoming 'unpaired'. It's probably not release-ready yet, but if you're brave, i can send you a copy and instructions. I developed it because i was getting tired of the overhead of running 3-4 different trimming tools over 200GB datasets.

          On the other hand, i'm sure cutadapt will do the job.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Best Practices for Single-Cell Sequencing Analysis
            by seqadmin



            While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
            06-06-2024, 07:15 AM
          • seqadmin
            Latest Developments in Precision Medicine
            by seqadmin



            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

            Somatic Genomics
            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
            05-24-2024, 01:16 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 07:24 AM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-13-2024, 08:58 AM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-12-2024, 02:20 PM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-07-2024, 06:58 AM
          0 responses
          184 views
          0 likes
          Last Post seqadmin  
          Working...
          X