No announcement yet.

Trimming adapter sequences - is it really necessary?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Trimming adapter sequences - is it really necessary?

    Removal of adapter sequences in a process called read trimming, or clipping, is one of the first steps in analyzing NGS data. With more than 30 published adapter trimming tools there is a more than large choice for the appropriate tool. Yet, there is a debate whether this step really is as important as the number of tools suggests, or whether it is possible to skip this time-consuming step for many NGS applications. read more

    Interesting discussion on

    ecSeq Bioinformatics is Europe’s leading provider of hands-on bioinformatics workshops and professional data analysis in the field of Next-Generation Sequencing (NGS).

  • #2
    Saving time by not pre-trimming before mapping / assembling, means time is going to be added back into the pipeline downstream. In the case of assembly, it will likely be added in with interest.

    Not trimming before assembly means a graph-based assembler will face greater complexity, take longer to run and require more memory. When it finally prunes low traffic edges and dead-ends in the graph, then it's effectively trimming at that time which is a far more complicated way to get to the same point as a pre-trim -> assembly, so I doubt there's any time saving.

    Given that an assembler and trimmer are both I/O bound when reading in data and that both are typically streaming input, then they can be piped. The worst bottleneck (disk I/O) can be reduced from R->W->R->W to just R->W. Because trimming is so computationally simple (we're not compute bound on trimming), then injecting a battle-tested pre-trim step between the disk and assembler input adds minimal time to the overall process.

    Not trimming before mapping has some merit because a trailing adapter will rarely throw off the alignment and, post-alignment, it's even easier to confidently spot an adapter even if it has some miscalls in it. It does add complexity for PE mapping though, particularly if the reads cross over. An unsophisticated mapper will drop data because of this.

    However, a trimming step is still going to have to be done after mapping instead of before, so where's the time saving? Taking the alignment into consideration during a post-trim is required (otherwise it's no different to a pre-trim). This means added memory and time unless the mapper has integrated trimming.

    If depth is very high (eg, organelle sequencing), then one can always leave the adapters in, not bother post-trimming and set higher filters for variant calling. Once more, adding downstream complexity and noisy data that could have been avoided by a pre-trim.

    Overall, I don't think the case against pre-trimming is particularly strong.
    Last edited by JulianTF; 09-20-2016, 01:02 AM.


    • #3
      Physical removal of adapter/ primer sequences

      Slightly off topic, but definitely related, I've been doing a little hunting on the physical removal of the remnants of adapters or, more likely, the primer remnants of a library generated by targeted amplification. I'm only aware of the Thermo (Ion Torrent) AmpliSeq method of removing this remnant, with the (still mysterious?) FuPa reagent partially digesting the primer remnants away, such that when the A and P1 adapters are ligated, the known primer sequences are not interrogated. Are there other methods that achieve the same thing that I am unaware of? Searching variants of "primer remnant removal" brings back endless hits on how to do this in silico, but I am interested in other methods that achieve it in vitro.
      Perhaps this is only an issue for Ion Torrent, due to the length of time required to flow individual dNTPs, and the fact that the highest quality sequence early in the run is otherwise sacrificed to the generation of sequences that are already known and are uninformative?


      • #4
        Just try some of the removal tools and see if teh adapter sequences are indeed there... I did it multiple times with reads from Proton and always found that there is not much to trim, likely just random occurences of sequences identical to part of the adapter. Such result was expected - the adapters are removed during basecalling. See:
        If the sequenced part of the adapter is shorter than 6 bp it will be ignored.