    Hello everybody,

    I can see that my 1Mio 454 reads are full of insertions/deletions in homopolymer regions, and of course this disrupts my ORFs in later stages (annotation step). Fortunately, I have ca 60mio Illumina reads that I can use to correct these errors. Both dataset are cDNA.

    Has anyone a preferred method to correct short sequencing error indels? If tractable, I would prefer correcting my reads rather than correcting my draft assembly obtained from those reads. I was going to use ssaha (or segemehl, qpalma, other?) to map the illumina reads on my 454 reads, and then extract a consensus by parsing the pileup file generated with samtools

    I have seen a few published, more advanced, methods available but I*do not now if one of them performs better (e.g. iCorn). Do you?

    Thanks a lot!


    I don't know if any programs will correct your reads!

    See here for another effort .. I have got it to work but am still interpreting results



      I did this once. My approach was to map the reads to the contigs (BWA -e5), realigned the indels using GATK and polished the contigs based on the the indel report. I did this multiple times as some corrections made it possible to correct other neighbouring regions.

      I dont have the polishing script any more, but I recall it as fairly simple. It should be possible to polish the reads instead, with some perl regex magic.


        Have you seen this paper?