Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SSPACE: a new stand-alone scaffolding tool for small and large genomes

    Hi all,

    during my Master thesis I developed a stand-alone scaffolding tool named SSPACE for scaffolding pre-assembled contigs using paired-read data. I developed this program since I couldn't find a program which was able to do this, except from Bambus. However, we had lots of issues on Bambus, including errors and complicated input datasets.

    Therefore, SSPACE was developed. The main featues are;

    * Inputs are simple FASTA contig sequences as well as (multiple) FASTA/FASTQ paired-read data
    * High-quality scaffolds in a short runtime and limited memory requirements
    * High reduction of the amount of contigs stored into scaffolds and high N50 value
    * Multiple library input of both paired-end and/or mate pair datasets
    * Possible contig extension of unmapped sequence reads
    * Easy interpretation of the final scaffolds
    * Visualization of the final scaffolds using GraphViz

    SSPACE has been tested on the E.coli, Grosmannia clavigera and Giant Panda genomes and showed less scaffolds and higher N50 value compared with the produced scaffolds from common de novo assemblers, like Abyss and SOAPdeNovo.

    SSPACE is freely available at


    The publication is accepted at bioinformatics and will be online soon. Publication shows more detailed information about the produced scaffolds and their quality, including time and memory information.

    Hope it could be useful and any comments or questions are ofcourse welcome.

    Cheers,
    Boetsie

  • #2
    Hi all,

    publication of SSPACE is now available at;



    Boetsie

    Comment


    • #3
      congrats


      Its grt to hear such an achievement.
      Is your paper freely available.
      Can you mail me downloadable software copy
      Regards,
      Ganga Jeena

      Comment


      • #4
        Congrats!

        Before I get into the paper, can I ask if this tool supports 'hierarchical scaffolding' in the way that Bambus (supposedly) does? i.e. If I want to add in 'scaffolding' information based on gene synteny from some related organisms, can I add that in but with a lower priority than the true PE/MP data?

        Does it detect repeats from the graph structure like Bambus does now?

        I'm curious because Bambus promises a lot of nice functionality, which is why I keep hammering away at it. However, I'm starting to wonder if it's time to jump ship to a tool that is more robust (if perhaps less feature rich).


        Cheers,
        Dan.
        Homepage: Dan Bolser
        MetaBase the database of biological databases.

        Comment


        • #5
          Nice paper! The question that arises is weather we can feed PE data directly to the algorithm, rather than being shoehorned through Bowtie?

          For example, Bowtie may not be the best tool for aligning 454 reads to contigs, but I'd still like to use 454 PE data to scaffold my assembly. Is there some intermediate file or Bowtie like PE format that we can feed to SSPACE?

          Unfortunately parts of http://bioinformatics.oxfordjournals.org are down, so I can't see the supplementary figure, sorry if that would help address my question.
          Homepage: Dan Bolser
          MetaBase the database of biological databases.

          Comment


          • #6
            Hi Dan,

            thanks for your reply!

            It does not fully supports the same hierarchical scaffolding as Bambus. We use a simple approach;

            1) Produce scaffolds using the first library
            2) Use scaffolds from 1), and produce scaffolds using the second library
            3) and so on...

            we do not use a priority for the libraries, like Bambus. We let the user determine what order of library is used.

            It is able to detect repeats by determining the number of incoming and outcoming 'links' between contigs. Repeats are outputted by the program.

            Bambus has indeed more functionality. However, we found that the input options were too complex for simple scaffolding purposes.

            About your question about Bowtie;
            Unfortunately, only Bowtie is supported at the moment, as SSPACE was designed for Illumina input (or other short paired reads) and based on Bowtie output.

            My question; What program do people use for aligning 454 reads, can it produce similar output as Bowtie?

            Cheers,
            Boetsie

            Originally posted by dan View Post
            Congrats!

            Before I get into the paper, can I ask if this tool supports 'hierarchical scaffolding' in the way that Bambus (supposedly) does? i.e. If I want to add in 'scaffolding' information based on gene synteny from some related organisms, can I add that in but with a lower priority than the true PE/MP data?

            Does it detect repeats from the graph structure like Bambus does now?

            I'm curious because Bambus promises a lot of nice functionality, which is why I keep hammering away at it. However, I'm starting to wonder if it's time to jump ship to a tool that is more robust (if perhaps less feature rich).


            Cheers,
            Dan.

            Comment


            • #7
              Thanks for the clear reply Boetsie, really great to hear that you do do repeat filtering based on graph structure, and allowing the user to pick the order of the libraries seems like a nice strategy.

              I've been using Newbler to align 454's PE data to contigs. Newbler automatically handles the specifics of the 454 style PE reads so, although it isn't the best aligner for 454, it is very easy to use the results, which are just tab delimited... You can read about the format of the Newbler PE data here!

              Newbler can be persuaded to output ace-like format too, but it doesn't do SAM/BAM IIRC.

              I was looking at the code, and it should be easy enough to feed in the data to SSPACE ;-)
              Homepage: Dan Bolser
              MetaBase the database of biological databases.

              Comment


              • #8
                Hi Boetsie,

                Does SSPACE use the SAM output format of Bowtie? If not, could it?

                Cheers,
                Shaun

                Comment


                • #9
                  Hi Shaun,

                  no it does not, it uses the standard output from bowtie. With modifications to the script, it should be possible to use the SAM format.

                  Cheers,
                  Boetsie

                  Comment


                  • #10
                    BAC / Fosmid end

                    Hi boetsie,

                    Can I use additional BAC/Fosmid ends for scaffolding the pre-assebmled contigs
                    or scaffolds with SSPACE? If it's possible, is there any parameter for this purpose?

                    Thanks,
                    Corthay

                    Comment


                    • #11
                      Originally posted by corthay View Post
                      Hi boetsie,

                      Can I use additional BAC/Fosmid ends for scaffolding the pre-assebmled contigs
                      or scaffolds with SSPACE? If it's possible, is there any parameter for this purpose?

                      Thanks,
                      Corthay
                      Hi Corthay,

                      i'm not very familiar with BAC/fosmid ends, so there is no parameter for this purpose. However, if;
                      - these are paired sequences
                      - the sequences' lengths are below 1024 (maximum input of Bowtie)
                      - the pairs have either orientation of --> <-- (typical paired-end) or <-- --> (typical mate pair)

                      I see no problems why you should not give it a try if it satisfies the above points.

                      Kind regards,
                      Boetsie

                      Comment


                      • #12
                        What would be great is a simple tab delimited format for providing paired sequence alignments, rather than going via Bowtie... I had a quick look at the code, but unfortunately I couldn't work out where to add such functionality easily. I'll have another look at some point if nobody else does.
                        Homepage: Dan Bolser
                        MetaBase the database of biological databases.

                        Comment


                        • #13
                          Hi Boetsie,

                          Thanks for the response.

                          I've just specified "k=2" as clone coverage of BAC ends is almost 5x.
                          As a result, scaffolds N50 is a bit improved and the number of scaffolds is reduced. Thanks for the development of useful tool.

                          Corthay.


                          Originally posted by boetsie View Post
                          Hi Corthay,

                          i'm not very familiar with BAC/fosmid ends, so there is no parameter for this purpose. However, if;
                          - these are paired sequences
                          - the sequences' lengths are below 1024 (maximum input of Bowtie)
                          - the pairs have either orientation of --> <-- (typical paired-end) or <-- --> (typical mate pair)

                          I see no problems why you should not give it a try if it satisfies the above points.

                          Kind regards,
                          Boetsie

                          Comment


                          • #14
                            Originally posted by dan View Post
                            What would be great is a simple tab delimited format for providing paired sequence alignments, rather than going via Bowtie... I had a quick look at the code, but unfortunately I couldn't work out where to add such functionality easily. I'll have another look at some point if nobody else does.
                            Hi Dan,

                            i know what you mean, but than multiple library input can't be used since we do an hierarchical clustering (first generate scaffolds using one library, than produce scaffolds by aligning next library on first scaffolds and produce new scaffolds etc...). So for each library we align the reads to the new scaffolds. Therefore, no predefined paired sequence alignments could be provided, except if only one library is used. In addition, if we have such an input we would be very similar to Bambus. Our purpose is to have an easy to use scaffolder without providing complex input formats, but with a simple fasta input.

                            Next week, i'll try to provide another alignment tool (e.g. Newbler) to map long reads to the contigs/scaffolds.

                            Kind regards,
                            Boetsie

                            Comment


                            • #15
                              Originally posted by corthay View Post
                              Hi Boetsie,

                              Thanks for the response.

                              I've just specified "k=2" as clone coverage of BAC ends is almost 5x.
                              As a result, scaffolds N50 is a bit improved and the number of scaffolds is reduced. Thanks for the development of useful tool.

                              Corthay.
                              Hi Corthay,

                              great that it worked and that it improved your assembly a bit!

                              Kind regards,
                              Boetsie

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Choosing Between NGS and qPCR
                                by seqadmin



                                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                10-18-2024, 07:11 AM
                              • seqadmin
                                Non-Coding RNA Research and Technologies
                                by seqadmin




                                Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                Nobel Prize for MicroRNA Discovery
                                This week,...
                                10-07-2024, 08:07 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 05:31 AM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-24-2024, 06:58 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-23-2024, 08:43 AM
                              0 responses
                              48 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-17-2024, 07:29 AM
                              0 responses
                              58 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X