ZORRO is an hybrid sequencing technology assembler. It takes 2 sets of pre-assembled contigs and merge them into a more contiguous and consistent assembly. We have already tested Zorro with Illumina Solexa and 454 from some of organisms varying from 3Mb to 100Mb. The main caracteristic of Zorro is the treatment before and after assembly to avoid errors.
The ZORRO project is maintained by Gustavo Lacerda, Ramon Vidal and Marcelo Carazzole and were first used in this Yeast assembly: Genome structure of a Saccharomyces cerevisiae strain widely used in bioethanol production
ZORRO needs to be better documented and has not undergone enough testing. If you want to discuss the pipeline you can join the mailing list: zorro-google group
ZORRO PIPELINE
Zorro is based on the minimus2 pipeline (AMOS package) and uses MuMMer,
AMOS and bowtie in its internals. Zorro takes 2 contigs fasta files as
input (representing assembled contigs from a whole genome assembly)
and one fasta file containing some of the reads used for assembly
(only 10X coverage is enough, more will slow down the pipeline and
consume more resources).
Zorro initial phase detect inconsistencies in the assemblies and split
the contigs where they occur. Next, zorro counts k-mers (default k=22)
in the reads and use the k-mer count table to detect and mask repeats
in both assembly1 and assembly2. After repeat masking, zorro uses nucmer
to detect overlaps between assembly1 and assembly2 (no overlaps between
contigs from the same assembly are allowed). All overlaps found in this
phase are expected to be between unique regions (because repeats are
masked). The overlaps are used to layout and generate consensus for the
merged contigs, using AMOS tools. The merged contigs are built using the
unmasked contigs, so the final merged assembly should include the repeat
regions.
Another round of assembly, less stringent, tries to merge contigs that
were not included in the first Zorro phase. All the contigs are outputted
to <prefix>.ZORRO.fasta. We recommend the use of SSPACE to scaffold the
ZORRO contigs.
Zorro Website: www.lge.ibi.unicamp.br/zorro
The ZORRO project is maintained by Gustavo Lacerda, Ramon Vidal and Marcelo Carazzole and were first used in this Yeast assembly: Genome structure of a Saccharomyces cerevisiae strain widely used in bioethanol production
ZORRO needs to be better documented and has not undergone enough testing. If you want to discuss the pipeline you can join the mailing list: zorro-google group
ZORRO PIPELINE
Zorro is based on the minimus2 pipeline (AMOS package) and uses MuMMer,
AMOS and bowtie in its internals. Zorro takes 2 contigs fasta files as
input (representing assembled contigs from a whole genome assembly)
and one fasta file containing some of the reads used for assembly
(only 10X coverage is enough, more will slow down the pipeline and
consume more resources).
Zorro initial phase detect inconsistencies in the assemblies and split
the contigs where they occur. Next, zorro counts k-mers (default k=22)
in the reads and use the k-mer count table to detect and mask repeats
in both assembly1 and assembly2. After repeat masking, zorro uses nucmer
to detect overlaps between assembly1 and assembly2 (no overlaps between
contigs from the same assembly are allowed). All overlaps found in this
phase are expected to be between unique regions (because repeats are
masked). The overlaps are used to layout and generate consensus for the
merged contigs, using AMOS tools. The merged contigs are built using the
unmasked contigs, so the final merged assembly should include the repeat
regions.
Another round of assembly, less stringent, tries to merge contigs that
were not included in the first Zorro phase. All the contigs are outputted
to <prefix>.ZORRO.fasta. We recommend the use of SSPACE to scaffold the
ZORRO contigs.
Zorro Website: www.lge.ibi.unicamp.br/zorro
Comment