Greetings!
I'm dealing with an assembly from a diploid organism and am in the process of reducing it to a haploid assembly as much as I can. When I map my illumina reads back to my contigs using BBmap I get a very distinct bimodal distribution.
As you can see from the image there are two distinct coverage peaks and I assume the higher coverage peak represents the contigs that are haploidified pieces of the genome whereas the lower coverage peak represents the contigs that are separately assembled haplotypes each getting half the reads.
If that is the case I would like to know how you would go about finding out which contigs are duplicates and can be merged. I've tried blasting all vs. all after repeat masking but the local alignment algorithm only returns small pieces that match and I don't know how to extract the information I need for whole contigs.
How do you deal with this issue?
Cheers,
-Jason
I'm dealing with an assembly from a diploid organism and am in the process of reducing it to a haploid assembly as much as I can. When I map my illumina reads back to my contigs using BBmap I get a very distinct bimodal distribution.
As you can see from the image there are two distinct coverage peaks and I assume the higher coverage peak represents the contigs that are haploidified pieces of the genome whereas the lower coverage peak represents the contigs that are separately assembled haplotypes each getting half the reads.
If that is the case I would like to know how you would go about finding out which contigs are duplicates and can be merged. I've tried blasting all vs. all after repeat masking but the local alignment algorithm only returns small pieces that match and I don't know how to extract the information I need for whole contigs.
How do you deal with this issue?
Cheers,
-Jason
Comment