Hello everybody!
I'm starting to work in this field and one of the first things I tried to do is a comparison between different short read aligners (like bfast, bowtie...).
The primary idea is to get an estimate of haw many reads can be mapped by each program given a set of reads with a measurable number of errors/variants.
For the reads generation I used the utilities provided by the bfast package and, using the reference human genome, I generated one million reads for each combination of: read length (50, 76 and 100bp), pairing (paired/unpaired), #SNPs (from 0 to 5) and #errors (from 0 to 5).
Then I used this inputs to feed the different algorithms and see how they struggle to match as much sequences as they can.
After the execution, I counted the number of matches (and the time the code took to get the work done).
Do you think the "experiment" is significative? Maybe I'm missing some point? Is the tests flawed in some way?
Please, let me know your opinions! Thanks a lot!
I'm starting to work in this field and one of the first things I tried to do is a comparison between different short read aligners (like bfast, bowtie...).
The primary idea is to get an estimate of haw many reads can be mapped by each program given a set of reads with a measurable number of errors/variants.
For the reads generation I used the utilities provided by the bfast package and, using the reference human genome, I generated one million reads for each combination of: read length (50, 76 and 100bp), pairing (paired/unpaired), #SNPs (from 0 to 5) and #errors (from 0 to 5).
Then I used this inputs to feed the different algorithms and see how they struggle to match as much sequences as they can.
After the execution, I counted the number of matches (and the time the code took to get the work done).
Do you think the "experiment" is significative? Maybe I'm missing some point? Is the tests flawed in some way?
Please, let me know your opinions! Thanks a lot!
Comment