I'm working in Consed with a hybrid Sanger/454 assembly that I generated using gsAssembler. I'm pretty familiar with consed and would like to use it to join contigs and analyze SNPs as I have done with Sanger-only assemblies. However, I'm running into some problems:
(1) When I try to view a read trace, consed calls "sff2scf" on Sanger reads as if they were 454 instead of reading the pre-existing scf file. This results in an error as there is no sff file for Sanger reads. This is causing problems when, for example, I want to extend or change the consensus sequence, since Consed requires this be done from the trace window.
The chromat_file is renamed following the Newbler convention of adding suffixes to reads based on the location of their mate pairs. For example, for a Sanger read named "ABCD.g1" the relevant lines in the ace file look like this:
DS CHROMAT_FILE: ABCD.g1.548-1.fm12429.pr12429 PHD_FILE: AB CD.g1.548-1.fm12429.pr12429.phd.1 TIME: Thu Jul 27 12:33:48 2000 CHEM: unknown DYE: unknown TEMPLATE: ABCD DIRECTION: rev.
Perhaps changing the chromat_file path in the ace file would help, but not if consed always calls "sff2scf".
(2) Lengthy "unaligned" regions are present at the start and end of contigs. To my eye at least some of these regions look quite well aligned and frequently contain sequence overlapping with other contigs, which is necessary to manually join them using the "Compare Contigs" command. Why are these considered "unaligned" by newbler? And how can I use them to join contigs, since consed won't allow unaligned regions to be used in the "compare contigs" window?
Anyone have some insight into these issues, or tools for hybrid Sanger/454 assemblies in general?
(1) When I try to view a read trace, consed calls "sff2scf" on Sanger reads as if they were 454 instead of reading the pre-existing scf file. This results in an error as there is no sff file for Sanger reads. This is causing problems when, for example, I want to extend or change the consensus sequence, since Consed requires this be done from the trace window.
The chromat_file is renamed following the Newbler convention of adding suffixes to reads based on the location of their mate pairs. For example, for a Sanger read named "ABCD.g1" the relevant lines in the ace file look like this:
DS CHROMAT_FILE: ABCD.g1.548-1.fm12429.pr12429 PHD_FILE: AB CD.g1.548-1.fm12429.pr12429.phd.1 TIME: Thu Jul 27 12:33:48 2000 CHEM: unknown DYE: unknown TEMPLATE: ABCD DIRECTION: rev.
Perhaps changing the chromat_file path in the ace file would help, but not if consed always calls "sff2scf".
(2) Lengthy "unaligned" regions are present at the start and end of contigs. To my eye at least some of these regions look quite well aligned and frequently contain sequence overlapping with other contigs, which is necessary to manually join them using the "Compare Contigs" command. Why are these considered "unaligned" by newbler? And how can I use them to join contigs, since consed won't allow unaligned regions to be used in the "compare contigs" window?
Anyone have some insight into these issues, or tools for hybrid Sanger/454 assemblies in general?
Comment