    I have been given a set of 454 data which was assembled using newbler. It is a repetitive region from a relative of grape, with lots of transposons.
    The current assembly has misassemblies due to the repeats, and I'd like take the original reads, clean them up and reassemble.

    There are about 40000 reads from 454 and maybe 2000 reads from Sanger sequencing of the two BACs that cover this region, where I might be able to use paired sequence information. I'm told by the sequencing facility there is no paired sequence information to take advantage of with 454 data.

    I was thinking about trying to use RepeatMasker with the known vectors and plant repeat database and then using CAP3 or PCAP to assemble the result.

    (1) Does anyone know which of the publicly available assembly engines works best on 454 data?
    (2) If you recommend using CAP3, which parameter settings would you modify from the default and what values would you use?
    (3) Are there any other sequence cleaning utilities you'd recommend?
    (4) When using RepeatMasker, is the cross_match engine better, or would you use RMblast?


    In your's case I would suggest using phrap or celera, and rising the minmatch value to 25-40. The standard phredPhrap will need some serious tweaking. (especially on the sff import side).

    Also make yourself a draft assembly blast database (for repeats borders identification), and extract repeats sequences from draft fasta or ace file, then use them as "vector" sequence (if you want to blank it out).
    I don't recommend newbler assembly for such blast DB - newbler usually clips off repeats borders at the consensus level.

    Let me know, If you need any more help.
