Announcement

Collapse
No announcement yet.

Software packages for next gen sequence analysis

Collapse
This topic is closed.
X
This is a sticky topic.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Swift: Primary Data Analysis for the Illumina Solexa Sequencing Platform

    Open source primary data analysis pipeline. Image analysis/basecalling. Paper here:

    http://bioinformatics.oxfordjournals...bstract/btp383

    Download it here: http://swiftng.sourceforge.net

    Abstract:

    Motivation: Primary data analysis methods are of critical importance in second generation DNA sequencing. Improved methods have the potential to increase yield and reduce the error rates. [Openly documented analysis tools enable the user to understand the primary data, this is important for the optimisation and validity of their scientific work.]

    Results: In this paper we describe Swift, a new tool for performing primary data analysis on the Illumina Solexa Sequencing Platform. Swift is the first tool, outside of the vendors own software, which completes the full analysis process, from raw images through to base-calls. As such it provides an alternative to, and independent validation of, the vendor supplied tool. Our results show that Swift is able to increase yield by 13.8%, at comparable error rate.

    Availability and Implementation: Swift is implemented in C++ and supported under Linux. It is supplied under an open source license (LGPL3), allowing researchers to build upon the platform. Swift is available from http://swiftng.sourceforge.net.

    Comment


    • Originally posted by dan View Post
      Sounds like a job for Semantic MediaWiki - We can register a free SMW site at referata.com and start playing around with ideas.

      SMW is basically a wiki database system.
      Originally posted by sci_guy View Post
      Sounds good
      Hey guys...I can't believe I missed these comments until now.

      I'm working on this as we speak...I have SMW and MediaWiki running on my private mirror of this site, however I'm hindered by the fact that I'm not savvy enough with SMW to implement it well.

      If you guys (and anyone else...) is up for helping out populate, etc, it's pretty straightforward to activate a SMW-ified mediawiki installation here.

      The main thing I think we need to move forward, is a data structure for these software packages. Maybe I'll just put it up tonight and let you guys loose.

      Comment


      • Hi ECO,

        I've been thinking about this a lot since it got suggested. One of the things that worries me is the proliferation of sites and locations where such a list should (or could) be maintained. I didn't realise until I looked that the NBIC list IS a wiki (1), and therefore, taking the well maintained content from that site and putting it into 'yet another wiki' seems like a potential waste of time (duplication of effort, splitting of an existing user base, etc., etc.).

        However, I think it *could* be useful to apply SMW, because of its powerful query and browsing features. Now, because I have a connection to Bioinformatics.Org, I'd like to try to implement the list there, but it should somehow be tied to both NBIC *and* Wikipedia (somehow) and other sites. Also, I don't want to just stomp all over your efforts to set up a SMW install here. So basically, my first hurdle was where and not what. I still haven't resolved that issue.

        Looking to the next step, deciding a data structure is important to do early on. SMW emulates a database in the wiki, but it isn't trivial to modify the structure of the wiki-DB once it has been set up. My plan was to simply follow what was being done at the NBIC (i.e. get their table of data into an equivalent wiki-DB table).

        For learning, first, I'd recommend you read the guidelines here:

        http://www.mediawiki.org/wiki/Extension:Semantic_Forms


        I'll see if I can get a dummy system up and running on Bioinformatics.Org (if only for demonstration purposes). Actually my guess is that the NBIC will install SMW in short order once we demonstrate what can be done with it (the user base there makes it a good choice for a permanent location for such a list...). So the question is, how to integrate our effort?
        Homepage: Dan Bolser
        MetaBase the database of biological databases.

        Comment


        • The problem about the NBIC cite is that there is no organization. I think that for the community a wiki well organize is something really important. In the NBIC you can see in random order viewers, assembler of long and short reads, reference assemblers.

          I think that the most important think is not only provide links to literature or to the web sites where tool can be downloaded but also provide personal experiences.

          I give an example: In every list allpaths is cited as a a good de novo assembler but it seems impossible to use it. Some thing for shorty, it can be compiled but do to the lack of documentation is not possible use it...

          Comment


          • Well, I got something up and running as quickly as I could here:

            http://www.bioinformatics.org/wiki/S...ta/Sandbox_ngs


            Note, the category (or 'class') I used for the pages is "Sandbox ngs", because this is still 'work in development', i.e. it's in the 'sandbox').

            Basically I dumped the table from the NBIC wiki:

            https://wiki.nbic.nl/index.php/High_...ncing#Software


            into the Semantic MediaWiki at Bioinformatics.Org using the 'Data Transfer' extension and the 'Semantic Forms' extension (and the page above is generated by the Semantic Drilldown extension).


            Try clicking on a page to see the data, currently crudely formatted in a wiki table for the package in question. For example:

            http://www.bioinformatics.org/wiki/MUMmer


            Notice the new 'edit with form' tab that Semantic Forms provides. For example:

            http://www.bioinformatics.org/w/inde...ction=formedit


            Finally, as a very brief example of the querying capabilities of the SMW, I have recreated the table seen on the NBCI page here:

            http://www.bioinformatics.org/wiki/Sandbox/Software


            Wherever the data ends up, I hope the above can be useful for demonstrating the features of SMW. Clearly we do need a better data model for the software, and integrating user experiences, feedback and rating needs to be carefully done. We don't want to duplicate the function of this forum, for example.

            HTH,
            Dan.
            Last edited by dan; 07-20-2009, 02:49 AM. Reason: fixed the very first link
            Homepage: Dan Bolser
            MetaBase the database of biological databases.

            Comment


            • Awesome work!!!!

              Has anyone heard/used AYB algorithm on Illumina GA? If you do use it would you give some contrast comparisons with other algorithms.

              I don't seem to find information about AYB on the web, if you know of any URL please pass it on, I truly appreciate the help.

              Comment


              • Originally posted by dan View Post
                Note, the category (or 'class') I used for the pages is "Sandbox ngs", because this is still 'work in development', i.e. it's in the 'sandbox').
                Indeed, that looks like a great way to structure things pretty well ... and with a minimum amount of work.

                Quick question: is it possible to add multiple tags? I'm just asking because many packages offer different functionalities (e.g. mapping to reference, de novo assembly, SNP detection) which is difficult to catch in only one tag.

                Some categories are a bit vague at the moment, e.g., the difference between "Assembly" and "Assembly (de-novo)" is eluding me, but this is something which could be quickly resolved.

                Regards,
                B.

                Comment


                • Originally posted by BaCh View Post
                  Indeed, that looks like a great way to structure things pretty well ... and with a minimum amount of work.

                  Quick question: is it possible to add multiple tags? I'm just asking because many packages offer different functionalities (e.g. mapping to reference, de novo assembly, SNP detection) which is difficult to catch in only one tag.

                  Some categories are a bit vague at the moment, e.g., the difference between "Assembly" and "Assembly (de-novo)" is eluding me, but this is something which could be quickly resolved.

                  Regards,
                  B.
                  Right... these are two important issues. Simple answer, yes, it is technically easy to allow multiple tags. Complex answer, how do we actually want to model the software? i.e. what tables, fields and values would you use to build a database of packages?

                  Second point, I just copied the data from the table at the NBIC:

                  https://wiki.nbic.nl/index.php/High_...ncing#Software


                  (let me know if you see any differences ;-)

                  We can rationalize this much better using the autocompletion and controlled vocabularies that the SMW system supports.

                  Dan.
                  Homepage: Dan Bolser
                  MetaBase the database of biological databases.

                  Comment


                  • anyone recommended jbrowse already?
                    The JBrowse genome browser
                    http://jbrowse.org/

                    Comment


                    • Tag-Seq Analysis Tool

                      Thanks for the nice post Sci_quy! It´s very helpful.
                      Unfortunately, there is no software package for Tag-Seq (Tag-Profiling/Illumina) Data listed/available.

                      Does anybody know, whether there exist specific packages and if this is the case which one is best to use? Or is it possible to use standard programs for RNA-Seq (like ERANGE)?
                      I´ve read about an in-house perl script. Does anybody have an equivalent? Is it possible to test it?

                      Thanks a lot.

                      Comment


                      • Originally posted by dan View Post
                        Complex answer, how do we actually want to model the software? i.e. what tables, fields and values would you use to build a database of packages?
                        I try to look at that problem from the perspective of a user:
                        I have X1 reads of type T1, which assembler can do de-novo assembly with that type? Or mapping assembly and if yes, does it output differences to the reference sequence?
                        If I then throw in X2 reads of type T2, which assemblers can use both read types for a hybrid assembly strategy?

                        For that the programs should get tags which denote their capabilities like "denovo Solexa", "denovo 454", "mapping Solexa", "SNP calling", "hybrid assembly" etc.pp so that users can iteratively filter to what they need.

                        Other aspects would be harder to tag like, e.g., up to which data quantity a certain program should be used etc.pp and when another program would be appropriate.

                        Just some random thoughts,
                        B.

                        Comment


                        • GenomeQuest

                          Dear all,

                          Just a brief message to point you also to GenomeQuest's new beta offering in NGS sequence data management and analysis. (COMMERCIAL ALERT! Please note that I am self-promoting as I am the VP of Software for the company.) However, I think the offering is pretty exciting and may be valuable for researchers and bioinformaticists. For you to decide...

                          You can try it for free with no obligation: www.genomequest.com. It runs in your web browser.

                          Currently we support RNA-Seq, Variant calling, Rapid Annotation (metagenomics), and long read assembly, as well as high-throughput mapping. CHiP-Seq, micro RNA, other assembly tools, and much more will be rolled out shortly. All of the world's reference data is kept up to date inside the system so you get up-to-the-minute accuracy. And you can upload and use your own reference data if you prefer. Of course, there are APIs for bioinformaticists to extend and integrate the system, and we're about to release a huge update to the APIs that make it stronger still. Again, much much more to come in the coming weeks and months. I'll let it speak for itself.

                          Most importantly, we're looking for your detailed help to make it better. We have a linkedin group for feature requests at http://www.linkedin.com/groups?gid=2056733 and I'd welcome any comments you have directly via email, as well.

                          Thanks for letting me have the opportunity to announce the new product.

                          Best regards,
                          Richard
                          [email protected]

                          Comment


                          • Open source Vs Commercial

                            Hi Folks, I an new to NGS and am tasked to evaluate commercial and open source alternatives to analyse NGS data. Looking at the pretty comprehensive list in this post (which is very useful) I am not able to make out whether we really need to pay for a solution such as Genomatix or can we do away with just the free open source tools?

                            Please comment or share your thoughts in this regard.

                            I Appreciate your time and inputs.
                            Thanks,

                            Comment


                            • Hi hrajasim,

                              That's a very open ended question, depending very strongly on what type of analysis you'd like to do. As far as I'm aware, the cutting edge development for NGS is happening in open source, so you'll probably want to fill major parts of the pipeline with open source (Eg, aligners, format converters, ChIP-Seq analysis, assembly, etc)

                              However, any time you're contrasting open source versus a commercial product, you have to ask the right questions:

                              Is there a feature set that I absolutely need?

                              Will I require support for the software beyond the forums and available community?

                              Will I want to make modifications to the source code and customize it for the pipeline?

                              Without knowing what you plan to do with the software, it will be impossible for us to figure out which path is best for you.

                              Cheers!
                              The more you know, the more you know you don't know. —Aristotle

                              Comment


                              • Further to the above, if you're a molecular biologist without coding skills it is far easier to stick to commercial integrated solutions however you may feel constrained by lack of features within the software if your application is anything other than vanilla.
                                If you embark upon the open source route you'll need skills in UNIX shell scripting/Perl/SQL. If you don't have them get a bioinformatician onboard. If you need to do 'counting' methods such as RNA-Seq, CHiP-Seq, Bis-Seq, etc then it is probably wise to consult a biostatistician.

                                Comment

                                Working...
                                X