Hello everyone,
I am not good with with writing optimal scripts, so I would appreciate someones help, either perl or python is fine.
I have contigs generated by assemblers in the following format:
NODE_2361_length_509_cov_1.43018_ID_745236
This is common to velvet and SPAdes. I was wondering how could I extract all contig names that in their name have a coverage above a user specified value. So I could perl selectAboveCoverage.pl 1.0 and get all above 1.0 including 1.0?
I know I have to first split by >, and then the index 0 will be contig name and index 1 will be the sequence. I need to further split index 0 by _, and index 5 will have the coverage value. I can then somehow sort by that and pick all of those above argv[0], and write them to a new file argv[1].
I know this from experience in TCL, but scripts on those languages take too long to execute.
Any help will be appreciated, thank you!
I am not good with with writing optimal scripts, so I would appreciate someones help, either perl or python is fine.
I have contigs generated by assemblers in the following format:
NODE_2361_length_509_cov_1.43018_ID_745236
This is common to velvet and SPAdes. I was wondering how could I extract all contig names that in their name have a coverage above a user specified value. So I could perl selectAboveCoverage.pl 1.0 and get all above 1.0 including 1.0?
I know I have to first split by >, and then the index 0 will be contig name and index 1 will be the sequence. I need to further split index 0 by _, and index 5 will have the coverage value. I can then somehow sort by that and pick all of those above argv[0], and write them to a new file argv[1].
I know this from experience in TCL, but scripts on those languages take too long to execute.
Any help will be appreciated, thank you!
Comment