I developed my own tool to do QC and adapter/artifact/MID removal and I offer cleanup work as a commercial service. I think I may dare to say that I really have an overview based on more than 1700 454-based datasets from worldwide. I collected lots of artifacts from all those datasets and learned what one has to remove to get better assemblies, found several funny mistakes which happened time to time in some labs, some software-driven errors and notably, bad designs of certain lab protocols.
In respect to publicly available tools ... and your question what you could use for your work? None. They just don't do the right thing, at all. Nobody told those programmers what to look for properly and because I know what they are missing I can only say that they never tested properly their software.
![Frown](https://www.seqanswers.com/core/images/smilies/frown.png)
Forget about QC based on PHRED values. It is merely useless. CAP3 is a usual trick to squash reads into some consecutive sequence once you realize you are unable to get reads merged together. It is good to get on average 400nt long contigs for the purpose of your paper. Interestingly, reviewers let such papers become published although raw read length was for example 310nt on average (FLXti).
![Wink](https://www.seqanswers.com/core/images/smilies/wink.png)
I see several shrimp datasets in NCBI SRA including one infected with some virus. Is that the dataset you are talking about here?
![Wink](https://www.seqanswers.com/core/images/smilies/wink.png)
Leave a comment: