Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pacbio scaffolding

    I am surprised to not be able to find any scaffolders for pacbio data. I am looking for something that SSPACE does (surprised to see that SSPACE doesn't accept pacbio as input). I have a bunch of contigs generated by velvet, and now I want to link these contigs by using LONG reads, which are pacbio corrected for error.

    Honestly, I tried reading how bambus works, they say that it accepts any input from any assembler but they have made it so complicated......

  • #2
    Would PBJelly work?



    "PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes. Each step in PBJelly’s workflow can be run on a cluster, thus parallelizing the gap filling process for rapid turn around, even for very large eukaryotic genomes."

    Comment


    • #3
      Also: A hybrid approach for the automated finishing of bacterial genomes


      PBJelly likely does a better job at getting correct sequences in the gaps, but the hybrid assembler was designed to handle identification of tricky repeat regions and to not misassemble them. The utility may be limited to bacterial sized genomes though.

      A snippet from the paper:
      To produce the hybrid assembly, we first generated a consensus CDC contig set. Given the clonal nature of the CDC isolates (Supplementary Results), we split contigs from the minimal CDC assembly that were inconsistent with the remaining two isolates. If the split resulted in a subcontig of <1 kb in length, the subcontig was eliminated. We input the resulting 97 contigs in this set, along with 94,526 single-molecule reads from the PacBio RS with an average accuracy of 82.9% (Supplementary Fig. 2), into our hybrid assembly pipeline (Supplementary Fig. 3).

      Comment


      • #4
        The way I understand it, is that pb-jelly closes gaps that already exist "NNNNN", but I am interested in building new contigs, or reducing their number.

        Comment


        • #5
          "AHA (A Hybrid Assembler) uses PacBio's exceptionally long reads to improve existing assemblies and fill in gaps." http://www.pacificbiosciences.com/pr...re/algorithms/. I guess part of smrtpipe software...

          Comment


          • #6
            This is part of SMRT-pipe. Anyone have any idea how to download that package? Never doing pacbio again..... closed source...

            Comment


            • #7
              PacBio's software is all open source, BSD license. See pacbiodevnet.com for downloads and links to GitHub projects.

              Comment


              • #8
                Originally posted by jbingham View Post
                PacBio's software is all open source, BSD license. See pacbiodevnet.com for downloads and links to GitHub projects.
                Seems you are right. I gave it a shot, and oh my god... why in the world are they doing this. I mean seriously? In order to use one of their tools I need to download a 1 GB file and go through extensive installation instructions outlined here:



                ?

                Can anyone please tell me why don't they just have generic executable for some of their software that is part of that pipeline? Why do I have to spend a day installing this? This should be simpler. Sorry for my rant, I just don't get it.

                I don't want all their fancy tools, I don't need to login via a web interface to see what's up... oh well...

                Comment


                • #9
                  Maybe the Amazon image is what you need. Nothing to install, just boot up a VM.

                  Agree that it's a big download to get everything. The aligner and variant caller (blasr and quiver) are what you requested: separate installs from GitHub. See pacbiodevnet.com for links on the Compatible Software page.

                  Comment


                  • #10
                    Nah dude, it's that what I want:

                    AHA: a hybrid assembler to scaffold existing contigs and fill gaps. Available only in SMRT Analysis. Since v1.0

                    I want to link my contigs with long reads, that are sometimes even 1x in coverage.

                    Comment


                    • #11
                      In that case, you will need either the Amazon VM or the full install. Sorry!

                      Comment


                      • #12
                        Originally posted by jbingham View Post
                        In that case, you will need either the Amazon VM or the full install. Sorry!
                        I will try the Amazon VM, thank you very much for your help!

                        Comment


                        • #13
                          Originally posted by AdrianP View Post
                          I am surprised to not be able to find any scaffolders for pacbio data. I am looking for something that SSPACE does (surprised to see that SSPACE doesn't accept pacbio as input). I have a bunch of contigs generated by velvet, and now I want to link these contigs by using LONG reads, which are pacbio corrected for error.
                          We at BaseClear (developers of SSPACE) have developed a modified version of SSPACE which accepts PacBio long reads. The method gives very nice results, but at this moment we offer this only as an internal service since the algorithm itself is still in a testing phase. Official release might follow later this year, but has not been decided yet. If you are interested in BaseClear's assembly-service please write to [email protected]

                          Kind Regards,
                          Boetsie

                          Comment


                          • #14
                            Originally posted by boetsie View Post
                            We at BaseClear (developers of SSPACE) have developed a modified version of SSPACE which accepts PacBio long reads. The method gives very nice results, but at this moment we offer this only as an internal service since the algorithm itself is still in a testing phase. Official release might follow later this year, but has not been decided yet. If you are interested in BaseClear's assembly-service please write to [email protected]

                            Kind Regards,
                            Boetsie
                            I am aware of SSPACE and I used it on a different genome project that is still in process and I like how it works. I was surprised that it accepts matepairs as input data but not pacbio reads. pacbio reads are similar in the sense that they are long range information but as opposed to mate pairs they have a definite length and would not necessarily fill the gap with NNNN.

                            Comment


                            • #15
                              Originally posted by AdrianP View Post
                              I am aware of SSPACE and I used it on a different genome project that is still in process and I like how it works. I was surprised that it accepts matepairs as input data but not pacbio reads. pacbio reads are similar in the sense that they are long range information but as opposed to mate pairs they have a definite length and would not necessarily fill the gap with NNNN.
                              Well, in general they are the same. But the type of data is rather different. I think you should be well aware of the fact that PacBio has a high error rate, which makes it difficult for the alignment process since it leads to false positive alignments. This can of course result into erroneous scaffolds.
                              In addition, since the alignment is based on the whole PacBio read, the pacbio read can contain multiple contigs on a single read, while the matepair spans at most two contigs. Because of this, the algorithm for SSPACE should be changed and that's why the addition of PacBio reads is not so simple as you think.

                              For now, you can ofcourse make 'fake' paired-reads of the pacbio reads and put these into SSPACE.

                              Regards,
                              Boetsie

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Non-Coding RNA Research and Technologies
                                by seqadmin




                                Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                Nobel Prize for MicroRNA Discovery
                                This week,...
                                10-07-2024, 08:07 AM
                              • seqadmin
                                Recent Developments in Metagenomics
                                by seqadmin





                                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                09-23-2024, 06:35 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 06:55 AM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-02-2024, 04:51 AM
                              0 responses
                              105 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-01-2024, 07:10 AM
                              0 responses
                              113 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-30-2024, 08:33 AM
                              1 response
                              117 views
                              0 likes
                              Last Post EmiTom
                              by EmiTom
                               
                              Working...
                              X