I am aligning bacterial draft genomes to a reference genomes using mugsy v1.2.3, and I'm using the branch of Biopython that supports MAF files to parse the output and convert to other alignment formats (fasta, nexus).
I've run into a couple problems:
1) In some of the blocks the alignment is to the negative strand of the reference sequence. I think I've solved this problem by identifying those blocks and using Biopython's reverse complement feature to change all the sequence records in that block.
2) It appears that some of the blocks overlap each other in reference to the positions in the reference sequence. This becomes a problem when I try to create a MAF index to help convert to other formats.
Is there any way to specify a particular sequence as reference with mugsy? Anyone have any tools they would recommend to deal with the overlapping blocks?
Thanks!
Updated to add example of overlap:
Here are the descriptions of the reference sequence for the first few alignment blocks (Sequence Name, Start Position, Length, Strand):
MC2155.NC_008596 0 5508 +
MC2155.NC_008596 5508 5 +
MC2155.NC_008596 5513 54 +
MC2155.NC_008596 5567 1 +
MC2155.NC_008596 5568 431 +
MC2155.NC_008596 5999 61 +
MC2155.NC_008596 6060 96 +
MC2155.NC_008596 6156 335 +
MC2155.NC_008596 6491 1 +
MC2155.NC_008596 6492 3835 +
MC2155.NC_008596 10327 31344 +
MC2155.NC_008596 46654 4517 -
MC2155.NC_008596 50516 3967 +
MC2155.NC_008596 52436 49969 -
MC2155.NC_008596 54483 4 +
I've run into a couple problems:
1) In some of the blocks the alignment is to the negative strand of the reference sequence. I think I've solved this problem by identifying those blocks and using Biopython's reverse complement feature to change all the sequence records in that block.
2) It appears that some of the blocks overlap each other in reference to the positions in the reference sequence. This becomes a problem when I try to create a MAF index to help convert to other formats.
Is there any way to specify a particular sequence as reference with mugsy? Anyone have any tools they would recommend to deal with the overlapping blocks?
Thanks!
Updated to add example of overlap:
Here are the descriptions of the reference sequence for the first few alignment blocks (Sequence Name, Start Position, Length, Strand):
MC2155.NC_008596 0 5508 +
MC2155.NC_008596 5508 5 +
MC2155.NC_008596 5513 54 +
MC2155.NC_008596 5567 1 +
MC2155.NC_008596 5568 431 +
MC2155.NC_008596 5999 61 +
MC2155.NC_008596 6060 96 +
MC2155.NC_008596 6156 335 +
MC2155.NC_008596 6491 1 +
MC2155.NC_008596 6492 3835 +
MC2155.NC_008596 10327 31344 +
MC2155.NC_008596 46654 4517 -
MC2155.NC_008596 50516 3967 +
MC2155.NC_008596 52436 49969 -
MC2155.NC_008596 54483 4 +
Comment