Right now that primary data is processed with closed source proprietary tools provided by the manufacturer. That's really unfortunate because the data is being used to draw scientific conclusions. It's difficult to trust your data and understand the artifacts in it if the data analysis algorithms are not open to peer review. Not only that but it means you can't easily change things and try out new methods.
Until recently I was working at the Sanger Institute and in order to address this we have been developing a primary data analysis package for next-gen sequence data. At the moment our tools are aimed at Illumina data, but it should be possible to adapt them for processing SOLiD images as well.
I've recently left Sanger, to pursue a career in next-next-gen sequencing at Oxford Nanopore Technologies. I'm going to continue developing Swift, as will my colleagues (particularly Tom Skelly who's put a lot of work in to Swift) at Sanger.
While Swift is fully functional, it could do with more validation and testing. However, we've decided that we'd like to make it available to the wider community in the hope of gaining support and ideally attracting more developers.
Right now, the post image analysis corrections (basecalling) in Swift work well, generally it produces error rates lower than the Illumina pipeline. It's probably ready for production usage, so feel free to try it out and let us know what you find.
The native image analysis works but is more of a work in progress, we'd like people to try it out too and tell us what happens.
Swift is available under LGPL3 at: http://swiftng.sourceforge.net
You'll need to check it out of the subversion repository to run it, but it should be reasonably straight forward. Please email me if you have any trouble.
I'm very interested in getting any feedback, positive or negative. You can either post here or contact me direct: new at sgenomics dot org.
Until recently I was working at the Sanger Institute and in order to address this we have been developing a primary data analysis package for next-gen sequence data. At the moment our tools are aimed at Illumina data, but it should be possible to adapt them for processing SOLiD images as well.
I've recently left Sanger, to pursue a career in next-next-gen sequencing at Oxford Nanopore Technologies. I'm going to continue developing Swift, as will my colleagues (particularly Tom Skelly who's put a lot of work in to Swift) at Sanger.
While Swift is fully functional, it could do with more validation and testing. However, we've decided that we'd like to make it available to the wider community in the hope of gaining support and ideally attracting more developers.
Right now, the post image analysis corrections (basecalling) in Swift work well, generally it produces error rates lower than the Illumina pipeline. It's probably ready for production usage, so feel free to try it out and let us know what you find.
The native image analysis works but is more of a work in progress, we'd like people to try it out too and tell us what happens.
Swift is available under LGPL3 at: http://swiftng.sourceforge.net
You'll need to check it out of the subversion repository to run it, but it should be reasonably straight forward. Please email me if you have any trouble.
I'm very interested in getting any feedback, positive or negative. You can either post here or contact me direct: new at sgenomics dot org.
Comment