Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • new300
    Member
    • Mar 2008
    • 50

    Swift: Open source primary data analysis for Next-gen sequencers

    Right now that primary data is processed with closed source proprietary tools provided by the manufacturer. That's really unfortunate because the data is being used to draw scientific conclusions. It's difficult to trust your data and understand the artifacts in it if the data analysis algorithms are not open to peer review. Not only that but it means you can't easily change things and try out new methods.

    Until recently I was working at the Sanger Institute and in order to address this we have been developing a primary data analysis package for next-gen sequence data. At the moment our tools are aimed at Illumina data, but it should be possible to adapt them for processing SOLiD images as well.

    I've recently left Sanger, to pursue a career in next-next-gen sequencing at Oxford Nanopore Technologies. I'm going to continue developing Swift, as will my colleagues (particularly Tom Skelly who's put a lot of work in to Swift) at Sanger.

    While Swift is fully functional, it could do with more validation and testing. However, we've decided that we'd like to make it available to the wider community in the hope of gaining support and ideally attracting more developers.

    Right now, the post image analysis corrections (basecalling) in Swift work well, generally it produces error rates lower than the Illumina pipeline. It's probably ready for production usage, so feel free to try it out and let us know what you find.

    The native image analysis works but is more of a work in progress, we'd like people to try it out too and tell us what happens.

    Swift is available under LGPL3 at: http://swiftng.sourceforge.net

    You'll need to check it out of the subversion repository to run it, but it should be reasonably straight forward. Please email me if you have any trouble.

    I'm very interested in getting any feedback, positive or negative. You can either post here or contact me direct: new at sgenomics dot org.
  • cgb
    Member
    • May 2008
    • 50

    #2
    cool

    i wonder if if can be put onto a boot DVD and run on the iPar computers - data mirrored in real time using the sanger mirroring scripts ?

    Comment

    • new300
      Member
      • Mar 2008
      • 50

      #3
      Originally posted by cgb View Post
      i wonder if if can be put onto a boot DVD and run on the iPar computers - data mirrored in real time using the sanger mirroring scripts ?
      Yes, this absolutely should be possible and is something we'd like to look in to. Users interested in doing this are encouraged to make contract.

      Comment

      • dvh
        Member
        • Jul 2008
        • 35

        #4
        Could you maybe share some stats as to how Swift performs vs the current version of Bustard?
        E.g. amount of data/reads mapped, error rate for the same lane analysed both ways.
        thanks
        david

        Comment

        • new300
          Member
          • Mar 2008
          • 50

          #5
          Originally posted by dvh View Post
          Could you maybe share some stats as to how Swift performs vs the current version of Bustard?
          E.g. amount of data/reads mapped, error rate for the same lane analysed both ways.
          thanks
          david
          I'm still in the process of validating it on non-phiX data. For the phiX data I've looked at, against the 1.0 pipeline I've seen 20% more PF reads at a similar error rate.

          In terms of runtime, a GA1 single end takes around 10mins end to end. GA2 37 cycles paired end takes around an hour end to end.

          Comment

          • new300
            Member
            • Mar 2008
            • 50

            #6
            In terms of memory usage we're trying to stay within a 2Gb limit. A 37Gb paired end peaks at around 1Gb.

            Comment

            • timread
              Member
              • Oct 2008
              • 14

              #7
              BTW - the link: http://swiftng.sourceforge.net appears to be broken.

              The connection seems to be a problem only from my desktop at work (which is behind a US government firewall). From other locations i can get through OK.
              Last edited by timread; 11-18-2008, 12:44 PM. Reason: clarification of connection problem

              Comment

              • cgb
                Member
                • May 2008
                • 50

                #8
                works for me

                Comment

                • iris42
                  Junior Member
                  • Nov 2008
                  • 1

                  #9
                  Is it normal to see different output when running the same binary version of swift on the same computer for multiple times and running it on different computers? I observed both. It looks like most of the differences in the fastq output is the quality scores.

                  Comment

                  • new300
                    Member
                    • Mar 2008
                    • 50

                    #10
                    Originally posted by iris42 View Post
                    Is it normal to see different output when running the same binary version of swift on the same computer for multiple times and running it on different computers? I observed both. It looks like most of the differences in the fastq output is the quality scores.
                    Running on different computers it's quite likely that the output will vary slightly as they are likely to have different floating point implementations.

                    On the same computer is a little odd, how different are the results? If it's a small difference then this could be down to the FFTW implementation we are using which sometimes employs a non-deterministic algorithm.

                    Comment

                    • new300
                      Member
                      • Mar 2008
                      • 50

                      #11
                      Originally posted by timread View Post
                      BTW - the link: http://swiftng.sourceforge.net appears to be broken.

                      The connection seems to be a problem only from my desktop at work (which is behind a US government firewall). From other locations i can get through OK.
                      Odd, you can try: http://sgenomics.org/swift/ which should also work.

                      Comment

                      • lparsons
                        Member
                        • Nov 2008
                        • 28

                        #12
                        I'm quite interested in using open-source software for scientific work. We have recently acquired an Illumina GAII machine, and are trying to come up with data management solutions. Right now we are planning to throw away the images after the primary analysis (base-calling) is completed. We are saving the intensity and noise files, but not the images, which seems to be fairly common. However, it seems that this software requires the original images, which makes sense, but would limit our ability to use it on past experiments.

                        Would it be feasible to use swift on the Firecrest output (intensity and noise)?

                        Do many labs actually save the image files?

                        It seems like an ideal initial setup would be to process the images with both the Illumina pipeline and Swift. Has anyone yet set this up?

                        Comment

                        • clivey
                          Member
                          • Jul 2008
                          • 24

                          #13
                          sanger have it set up - talk to Tom Skelley.

                          Images are still very diagnostic of any issue with your sample or sequencer (or run). Looking at images allowed sanger to optimise their pipeline. For example, when your flowcell quality goes down, or an operator gets oil on the flowcell etc., or your focusing is off and you suddenly get lots of strange new 'contaminants' in your output file as a result, or your base qualities all drop halfway through your project, youe data goes bad and you look and your clusters look wierd coz of an issue with your cluster station, or theres stuff growing in your reagents appearing as blobs on the images (but not visible to the naked eye), or your flowcell surface isnt there etc etc. You should keep them for QC - then throw them. Generally (but not in all cases) higher throughput labs with big projects indulge in some image retention for some period.

                          Comment

                          • new300
                            Member
                            • Mar 2008
                            • 50

                            #14
                            Originally posted by lparsons View Post
                            I'm quite interested in using open-source software for scientific work. We have recently acquired an Illumina GAII machine, and are trying to come up with data management solutions. Right now we are planning to throw away the images after the primary analysis (base-calling) is completed. We are saving the intensity and noise files, but not the images, which seems to be fairly common. However, it seems that this software requires the original images, which makes sense, but would limit our ability to use it on past experiments.
                            Are you using iPar to process the images and then mirroring off the intensity files? Swift will process from intensity files (as produced by the UNIX pipeline). I've heard the iPar intensity format is different from that used by the UNIX pipeline if someone wants to send me a sample file I'll write a parser for it.

                            Originally posted by lparsons View Post
                            Would it be feasible to use swift on the Firecrest output (intensity and noise)?
                            Yes it's feasible, I would hope the results would be comparable with the Illumina pipeline.

                            Originally posted by lparsons View Post

                            Do many labs actually save the image files?

                            It seems like an ideal initial setup would be to process the images with both the Illumina pipeline and Swift. Has anyone yet set this up?
                            As mentioned Sanger save the images while they do QC, the images are mirrored off as the run progresses and processed using the UNIX pipeline on a separate cluster.

                            If you're interested in trying out Swift drop me an email at new at sgenomics dot org. It's in ``active development'' at the moment and I'm happy to work with people on any issues that come up.

                            Comment

                            • bioinfosm
                              Senior Member
                              • Jan 2008
                              • 483

                              #15
                              Are there any updates on SWIFT? data sizes, number of files generated, comparison with Illumina pipeline results..
                              --
                              bioinfosm

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                Today, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              26 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              33 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              25 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              190 views
                              0 reactions
                              Last Post seqadmin  
                              Working...