Dear all,

I have PacBio reads from a pool of BACs (eventually, 100X coverage taking into account an average of 120 Kb of insert size in the BAC), and am having problems to find a decent solution to mask the vector sequences. The vector is 8 Kb long, and given the fact that a considerable proportion of my reads are longer than 8 Kb, that means that I will have in many cases the vector sequence in the middle, and the sequences I am interested in at the ends of the reads...

My idea is to use SSAHA2 as recommended in the MIRA manual, and then try to assemble with MIRA (all this after correcting the reads using the PacBioToCA pipeline, which I got to run it without any problem). However, how would MIRA use a read with a sequence masked in the middle? Would it use the two extremes as independent reads (wanted effect)? Or would join the two extremes (unwanted effect)?

I have tried to assemble without masking the vector, but since it's the same vector for all the BACs, I am getting problems and to many contigs (including quimaeras...)

Thanks a lot!

