Hey everybody,
I'm taking a bio-informatics course as an undergrad and having a little trouble with building an assembler.
So, the project is to build a whole genome assembly given a set of thousands of reads. What I've done so far is write the Smith Waterman code in Python and am now trying to start with the assembly.
The basic way I can see to start is going through all of utthe reads and build k-mer hashes. So, I did that and now I'm kind of stuck.
I undestand the basic idea is now to go through the k-mer hash and find which reads have the same k-mer's. But, I have no idea how to use this information to determine when to use a Smith Waterman alignment. Also, where do I store these Smith Waterman alignments to later use them to build a whole genome?
I would appreciate any direction, this is killing me.
Thanks again!
I'm taking a bio-informatics course as an undergrad and having a little trouble with building an assembler.
So, the project is to build a whole genome assembly given a set of thousands of reads. What I've done so far is write the Smith Waterman code in Python and am now trying to start with the assembly.
The basic way I can see to start is going through all of utthe reads and build k-mer hashes. So, I did that and now I'm kind of stuck.
I undestand the basic idea is now to go through the k-mer hash and find which reads have the same k-mer's. But, I have no idea how to use this information to determine when to use a Smith Waterman alignment. Also, where do I store these Smith Waterman alignments to later use them to build a whole genome?
I would appreciate any direction, this is killing me.
Thanks again!
Comment