I have a collection of Illumina HiSeq 2000 reads that should span a specific coding region in a viral genome. The region these reads cover is 2625bp. What I want to do is generate a consensus of that region from all my reads.
The only thing I've tried so far is IDBA_UD. I downsampled to ~100x and ran it, but the assembly contigs summed up much larger than the region I know these reads should span. I also tried using all the data, but that was even further off base.
I have excessive coverage (~77000x), but the reads are from a population of quasi-species and have some variation. What would be the best tool to use to generate a consensus?
The only thing I've tried so far is IDBA_UD. I downsampled to ~100x and ran it, but the assembly contigs summed up much larger than the region I know these reads should span. I also tried using all the data, but that was even further off base.
I have excessive coverage (~77000x), but the reads are from a population of quasi-species and have some variation. What would be the best tool to use to generate a consensus?
Comment