Hi,
We have a nearly completed eukaryotic genome assembled into several thousand scaffolds (total lenghth is~350MB). There are ~8000 gaps (NNNs) in those scaffolds. Recently we received de-novo assembled contigs from a different strain of the same species. When compared (blast/blat etc.), What program/script would be most suitable:
1) To find all the contigs overlapping with the gap region of the scaffolds i.e. the overhangs as shown below:
scaff1: ACGTACGTCCCGCCCGCGCNNNNNNNNNNNNNNN
contig: ACGTACGTCCCGCCCGCGCTACGCCGCGT
2) For a good quality overlap (e.g. 50bp overlap with 2 matches) extend the scaffold with the sequence from the matched contig.
Thanks.
We have a nearly completed eukaryotic genome assembled into several thousand scaffolds (total lenghth is~350MB). There are ~8000 gaps (NNNs) in those scaffolds. Recently we received de-novo assembled contigs from a different strain of the same species. When compared (blast/blat etc.), What program/script would be most suitable:
1) To find all the contigs overlapping with the gap region of the scaffolds i.e. the overhangs as shown below:
scaff1: ACGTACGTCCCGCCCGCGCNNNNNNNNNNNNNNN
contig: ACGTACGTCCCGCCCGCGCTACGCCGCGT
2) For a good quality overlap (e.g. 50bp overlap with 2 matches) extend the scaffold with the sequence from the matched contig.
Thanks.
Comment