Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SSPACE: a new stand-alone scaffolding tool for small and large genomes

    Hi all,

    during my Master thesis I developed a stand-alone scaffolding tool named SSPACE for scaffolding pre-assembled contigs using paired-read data. I developed this program since I couldn't find a program which was able to do this, except from Bambus. However, we had lots of issues on Bambus, including errors and complicated input datasets.

    Therefore, SSPACE was developed. The main featues are;

    * Inputs are simple FASTA contig sequences as well as (multiple) FASTA/FASTQ paired-read data
    * High-quality scaffolds in a short runtime and limited memory requirements
    * High reduction of the amount of contigs stored into scaffolds and high N50 value
    * Multiple library input of both paired-end and/or mate pair datasets
    * Possible contig extension of unmapped sequence reads
    * Easy interpretation of the final scaffolds
    * Visualization of the final scaffolds using GraphViz

    SSPACE has been tested on the E.coli, Grosmannia clavigera and Giant Panda genomes and showed less scaffolds and higher N50 value compared with the produced scaffolds from common de novo assemblers, like Abyss and SOAPdeNovo.

    SSPACE is freely available at


    The publication is accepted at bioinformatics and will be online soon. Publication shows more detailed information about the produced scaffolds and their quality, including time and memory information.

    Hope it could be useful and any comments or questions are ofcourse welcome.

    Cheers,
    Boetsie

  • #2
    Hi all,

    publication of SSPACE is now available at;



    Boetsie

    Comment


    • #3
      congrats


      Its grt to hear such an achievement.
      Is your paper freely available.
      Can you mail me downloadable software copy
      Regards,
      Ganga Jeena

      Comment


      • #4
        Congrats!

        Before I get into the paper, can I ask if this tool supports 'hierarchical scaffolding' in the way that Bambus (supposedly) does? i.e. If I want to add in 'scaffolding' information based on gene synteny from some related organisms, can I add that in but with a lower priority than the true PE/MP data?

        Does it detect repeats from the graph structure like Bambus does now?

        I'm curious because Bambus promises a lot of nice functionality, which is why I keep hammering away at it. However, I'm starting to wonder if it's time to jump ship to a tool that is more robust (if perhaps less feature rich).


        Cheers,
        Dan.
        Homepage: Dan Bolser
        MetaBase the database of biological databases.

        Comment


        • #5
          Nice paper! The question that arises is weather we can feed PE data directly to the algorithm, rather than being shoehorned through Bowtie?

          For example, Bowtie may not be the best tool for aligning 454 reads to contigs, but I'd still like to use 454 PE data to scaffold my assembly. Is there some intermediate file or Bowtie like PE format that we can feed to SSPACE?

          Unfortunately parts of http://bioinformatics.oxfordjournals.org are down, so I can't see the supplementary figure, sorry if that would help address my question.
          Homepage: Dan Bolser
          MetaBase the database of biological databases.

          Comment


          • #6
            Hi Dan,

            thanks for your reply!

            It does not fully supports the same hierarchical scaffolding as Bambus. We use a simple approach;

            1) Produce scaffolds using the first library
            2) Use scaffolds from 1), and produce scaffolds using the second library
            3) and so on...

            we do not use a priority for the libraries, like Bambus. We let the user determine what order of library is used.

            It is able to detect repeats by determining the number of incoming and outcoming 'links' between contigs. Repeats are outputted by the program.

            Bambus has indeed more functionality. However, we found that the input options were too complex for simple scaffolding purposes.

            About your question about Bowtie;
            Unfortunately, only Bowtie is supported at the moment, as SSPACE was designed for Illumina input (or other short paired reads) and based on Bowtie output.

            My question; What program do people use for aligning 454 reads, can it produce similar output as Bowtie?

            Cheers,
            Boetsie

            Originally posted by dan View Post
            Congrats!

            Before I get into the paper, can I ask if this tool supports 'hierarchical scaffolding' in the way that Bambus (supposedly) does? i.e. If I want to add in 'scaffolding' information based on gene synteny from some related organisms, can I add that in but with a lower priority than the true PE/MP data?

            Does it detect repeats from the graph structure like Bambus does now?

            I'm curious because Bambus promises a lot of nice functionality, which is why I keep hammering away at it. However, I'm starting to wonder if it's time to jump ship to a tool that is more robust (if perhaps less feature rich).


            Cheers,
            Dan.

            Comment


            • #7
              Thanks for the clear reply Boetsie, really great to hear that you do do repeat filtering based on graph structure, and allowing the user to pick the order of the libraries seems like a nice strategy.

              I've been using Newbler to align 454's PE data to contigs. Newbler automatically handles the specifics of the 454 style PE reads so, although it isn't the best aligner for 454, it is very easy to use the results, which are just tab delimited... You can read about the format of the Newbler PE data here!

              Newbler can be persuaded to output ace-like format too, but it doesn't do SAM/BAM IIRC.

              I was looking at the code, and it should be easy enough to feed in the data to SSPACE ;-)
              Homepage: Dan Bolser
              MetaBase the database of biological databases.

              Comment


              • #8
                Hi Boetsie,

                Does SSPACE use the SAM output format of Bowtie? If not, could it?

                Cheers,
                Shaun

                Comment


                • #9
                  Hi Shaun,

                  no it does not, it uses the standard output from bowtie. With modifications to the script, it should be possible to use the SAM format.

                  Cheers,
                  Boetsie

                  Comment


                  • #10
                    BAC / Fosmid end

                    Hi boetsie,

                    Can I use additional BAC/Fosmid ends for scaffolding the pre-assebmled contigs
                    or scaffolds with SSPACE? If it's possible, is there any parameter for this purpose?

                    Thanks,
                    Corthay

                    Comment


                    • #11
                      Originally posted by corthay View Post
                      Hi boetsie,

                      Can I use additional BAC/Fosmid ends for scaffolding the pre-assebmled contigs
                      or scaffolds with SSPACE? If it's possible, is there any parameter for this purpose?

                      Thanks,
                      Corthay
                      Hi Corthay,

                      i'm not very familiar with BAC/fosmid ends, so there is no parameter for this purpose. However, if;
                      - these are paired sequences
                      - the sequences' lengths are below 1024 (maximum input of Bowtie)
                      - the pairs have either orientation of --> <-- (typical paired-end) or <-- --> (typical mate pair)

                      I see no problems why you should not give it a try if it satisfies the above points.

                      Kind regards,
                      Boetsie

                      Comment


                      • #12
                        What would be great is a simple tab delimited format for providing paired sequence alignments, rather than going via Bowtie... I had a quick look at the code, but unfortunately I couldn't work out where to add such functionality easily. I'll have another look at some point if nobody else does.
                        Homepage: Dan Bolser
                        MetaBase the database of biological databases.

                        Comment


                        • #13
                          Hi Boetsie,

                          Thanks for the response.

                          I've just specified "k=2" as clone coverage of BAC ends is almost 5x.
                          As a result, scaffolds N50 is a bit improved and the number of scaffolds is reduced. Thanks for the development of useful tool.

                          Corthay.


                          Originally posted by boetsie View Post
                          Hi Corthay,

                          i'm not very familiar with BAC/fosmid ends, so there is no parameter for this purpose. However, if;
                          - these are paired sequences
                          - the sequences' lengths are below 1024 (maximum input of Bowtie)
                          - the pairs have either orientation of --> <-- (typical paired-end) or <-- --> (typical mate pair)

                          I see no problems why you should not give it a try if it satisfies the above points.

                          Kind regards,
                          Boetsie

                          Comment


                          • #14
                            Originally posted by dan View Post
                            What would be great is a simple tab delimited format for providing paired sequence alignments, rather than going via Bowtie... I had a quick look at the code, but unfortunately I couldn't work out where to add such functionality easily. I'll have another look at some point if nobody else does.
                            Hi Dan,

                            i know what you mean, but than multiple library input can't be used since we do an hierarchical clustering (first generate scaffolds using one library, than produce scaffolds by aligning next library on first scaffolds and produce new scaffolds etc...). So for each library we align the reads to the new scaffolds. Therefore, no predefined paired sequence alignments could be provided, except if only one library is used. In addition, if we have such an input we would be very similar to Bambus. Our purpose is to have an easy to use scaffolder without providing complex input formats, but with a simple fasta input.

                            Next week, i'll try to provide another alignment tool (e.g. Newbler) to map long reads to the contigs/scaffolds.

                            Kind regards,
                            Boetsie

                            Comment


                            • #15
                              Originally posted by corthay View Post
                              Hi Boetsie,

                              Thanks for the response.

                              I've just specified "k=2" as clone coverage of BAC ends is almost 5x.
                              As a result, scaffolds N50 is a bit improved and the number of scaffolds is reduced. Thanks for the development of useful tool.

                              Corthay.
                              Hi Corthay,

                              great that it worked and that it improved your assembly a bit!

                              Kind regards,
                              Boetsie

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Exploring the Dynamics of the Tumor Microenvironment
                                by seqadmin




                                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                07-08-2024, 03:19 PM
                              • seqadmin
                                Exploring Human Diversity Through Large-Scale Omics
                                by seqadmin


                                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                06-25-2024, 06:43 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 07-10-2024, 07:30 AM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-03-2024, 09:45 AM
                              0 responses
                              201 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-03-2024, 08:54 AM
                              0 responses
                              212 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-02-2024, 03:00 PM
                              0 responses
                              194 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X