Different assemblers respond to errors and trimming in different ways. For example, if you read the SGA paper, the authors recommend not to trim reads. Allpath-lg does not trim reads, either, as I remember. Other assemblers, such as SPAdes, may be less sensitive to trimming as they trim reads by default. Also, I have read somewhere (could be wrong) that SOAPdenovo developers recommend not to correct reads if you have enough RAM, but SGA/Allpath-lg etc always include error correction as a necessary step. At the end of day, which trimming/error correct approach to use is assembler dependent. If it were me, I would just use the tools/pipelines recommended by the developers. If I had time, I would combine different strategies/correctors and see what I would get. Probably the result is data dependent.
K-mer based error correctors typically use short k-mers. I think that is fine. With shorter k-mers, we more often collapse segmental duplications/repeats and will not be able to correct errors when they occur right at the sites differentiating repeats. However, only a small fraction of errors are not correctable due to repeats. If such errors can be corrected with long k-mers, assemblers can usually handle them well. I would not worry to much about the k-mer length in error correction, unless it is too short.
K-mer based error correctors typically use short k-mers. I think that is fine. With shorter k-mers, we more often collapse segmental duplications/repeats and will not be able to correct errors when they occur right at the sites differentiating repeats. However, only a small fraction of errors are not correctable due to repeats. If such errors can be corrected with long k-mers, assemblers can usually handle them well. I would not worry to much about the k-mer length in error correction, unless it is too short.
Comment