Announcement

Collapse
No announcement yet.

Oases: De novo transcriptome assembly of very short reads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Oases: De novo transcriptome assembly of very short reads

    Hello,

    I thought that this might be interesting to share. It was sent to the Velvet user's list:

    Dear Velvet users,

    Marcel Schulz (Max Planck Institute for Molecular Genetics) and I are very
    pleased to announce the beta release of the Oases transcriptome assembler.

    Many researchers wish to use their powerful next-gen sequencing machines
    to study the transcriptomes of new species. Unfortunately, Velvet is not
    designed for that task, as the repeat resolution modules rely explicitly
    on assumptions of linearity and uniform coverage distribution. This means
    that Velvet only produces fragmented transcriptome assemblies.

    This is why we jointly developed Oases. This new program takes in a
    preliminary assembly produced by Velvet, and exploits the read sequence
    and pairing information to produce transcript isoforms. When possible, it
    also detects and reports standard alternative splicing events. It is
    specifically designed to get around the issues of unequal expression
    levels and alternative splicing breakpoints.

    The code is still quite new, but it has already been thoroughly tried out
    by Marcel. He observed some very promising results on both simulated and
    experimental datasets.

    If you wish to try out Oases, simply consult the webpage at
    www.ebi.ac.uk/~zerbino/oases . All feedback and suggestions are more than
    welcome!

    Best regards,

    Daniel
    On the manual they refer to this paper: http://dx.doi.org/10.1371/journal.pcbi.1000147


    I have yet to try Oases or read the paper, but I bet that those in my lab will be interested to know if it works with bacterial transcriptomes.

    Greetings,
    Leonardo
    L. Collado Torres, Ph.D. student in Biostatistics.

  • #2
    Oases paper and oases mailing list

    Hi Leonardo,
    first of all thanks for posting the initial announcement of Oases in this forum.

    >On the manual they refer to this paper: >http://dx.doi.org/10.1371/journal.pcbi.1000147
    We do refer to the paper by Sammeth et al., but not because Oases or any info about transcriptome assembly is described there. This paper defines a nomenclature for alternative splicing events that we have adapted for parts of the output of Oases. But it's worth reading anyhow .

    The paper about Oases is not published, yet. Daniel and me put the software online as many people requested it. However it is still in beta.
    For people that are interested in details the best is to subscribe to the new mailing list that Daniel set up and where all improvements and information is posted by us:
    http://listserver.ebi.ac.uk/mailman/...fo/oases-users

    >I have yet to try Oases or read the paper, but I bet that those in my lab >be interested to know if it works with bacterial transcriptomes.
    Although I never tried it, for bacterial transcriptomes Oases should work really well. I would be great if could give us some feedback once you tried it.

    Kind regards,
    Marcel

    Comment


    • #3
      Hi Marcel,

      Does oases support the use of SOLiD data. Can I run velvetg_de and feed this into oases?

      Justin
      Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

      Comment


      • #4
        Originally posted by jjohnson View Post
        Hi Marcel,

        Does oases support the use of SOLiD data. Can I run velvetg_de and feed this into oases?

        Justin
        Hi Justin,
        yes it does. Daniel tried it with one single-end SOLiD data set already. We would be very interested in getting some feedback about the quality of the results on other, possibly paired-end, SOLiD data. I copy a mail from the user list with a discussion and explanation by Daniel about using Solid with Oases:

        >Hi Daniel,
        >I think the point that Titus and other solid users is trying to drive at is
        >that if you convert to sequence space earlier in the analysis pipeline, you
        >lose the potential benefit of sequencing in color space.
        >if you convert to sequence base at any point in the pipeline, I believe
        >there is no sequence analysis program that will refuse solid data.

        >>Hello Kevin,
        >>Absolutely, but velvet_de (double encoded) does not function in sequence space.
        >>Double encoding, although it uses letters {ATCG}, is in fact colorspace in disguise.
        >>To summarize, the correct pipeline for SOLiD genomic assembly is:
        >> Colorspace data => pre-conversion to double encoding* => velvet_de => post-conversion to sequence-space*
        >>By analogy, the appropriate pipeline for SOLiD transcriptome assembly is:
        >> Colorspace data => pre-conversion to double encoding* => velvet_de => oases => post-conversion to sequence space*
        >> In effect, all velvet/oases operations are made in colorspace, under cover of double encoding.
        >> Just to repeat myself, although the Velvet step needs to be performed with the specific velvet_de executable (under penalty of seriously mixed-up >>sequences), Oases works indifferently on double encoding or on sequence space (it just sees strings of letters). You can therefore use the same >>Oases executable as you would use in a sequence-space pipeline.

        >>Best regards,
        >>Daniel

        >>(* programs available at http://solidsoftwaretools.com/gf/project/denovo/ , developed by Craig Cummings and Vrunda Sheth at ABI)


        Good luck!
        Marcel

        Comment


        • #5
          I was checking the huge sea of blogs and posts on ABGT 2010 and found this one to be related: http://www.fejes.ca/2010/02/agbt-201...aas-broad.html

          @jjohnson
          Velvet can run with SOLiD data and so can Oases.
          L. Collado Torres, Ph.D. student in Biostatistics.

          Comment


          • #6
            Originally posted by MarcelS View Post
            Hi Justin,
            yes it does. Daniel tried it with one single-end SOLiD data set already. We would be very interested in getting some feedback about the quality of the results on other, possibly paired-end, SOLiD data. I copy a mail from the user list with a discussion and explanation by Daniel about using Solid with Oases:

            >Hi Daniel,
            >I think the point that Titus and other solid users is trying to drive at is
            >that if you convert to sequence space earlier in the analysis pipeline, you
            >lose the potential benefit of sequencing in color space.
            >if you convert to sequence base at any point in the pipeline, I believe
            >there is no sequence analysis program that will refuse solid data.

            >>Hello Kevin,
            >>Absolutely, but velvet_de (double encoded) does not function in sequence space.
            >>Double encoding, although it uses letters {ATCG}, is in fact colorspace in disguise.
            >>To summarize, the correct pipeline for SOLiD genomic assembly is:
            >> Colorspace data => pre-conversion to double encoding* => velvet_de => post-conversion to sequence-space*
            >>By analogy, the appropriate pipeline for SOLiD transcriptome assembly is:
            >> Colorspace data => pre-conversion to double encoding* => velvet_de => oases => post-conversion to sequence space*
            >> In effect, all velvet/oases operations are made in colorspace, under cover of double encoding.
            >> Just to repeat myself, although the Velvet step needs to be performed with the specific velvet_de executable (under penalty of seriously mixed-up >>sequences), Oases works indifferently on double encoding or on sequence space (it just sees strings of letters). You can therefore use the same >>Oases executable as you would use in a sequence-space pipeline.

            >>Best regards,
            >>Daniel

            >>(* programs available at http://solidsoftwaretools.com/gf/project/denovo/ , developed by Craig Cummings and Vrunda Sheth at ABI)


            Good luck!
            Marcel
            Hi Marcel and Daniel,
            I'm trying to reconstruct transcripts from RNA-seq (solid).

            * I have converted my reads to double encoded
            * executed velvet_de/Oases
            * Now I have the transcript in double encoded format and I'm trying to convert them to base space.

            Can you recommend a tool for this last step?

            I checked the link
            http://solidsoftwaretools.com/gf/project/denovo/

            But I was not able to figure out the best course of action.

            Thanks
            Last edited by cerca; 07-09-2010, 12:24 PM.

            Comment


            • #7
              Hi guys,
              I am also very interested in how to convert from double-encoded format to base space by using denovo (or denovo2) tools once oases finished its job.
              manual for this? I am using a SOLiD fragment library.

              Best,
              Ying

              Comment


              • #8
                Hello,
                One question.
                Is Velvet mandatory for Oases or can the contigs be derived from a different short read assembler such as ABySS?

                Cheers

                Markus

                Comment


                • #9
                  Originally posted by xuying View Post
                  Hi guys,
                  I am also very interested in how to convert from double-encoded format to base space by using denovo (or denovo2) tools once oases finished its job.
                  manual for this? I am using a SOLiD fragment library.

                  Best,
                  Ying
                  Hi,

                  I also got stuck on this part, did you solve it?

                  /Stefan

                  Comment


                  • #10
                    In the past, I've managed to convert Velvet/Oases output to base space using this package:

                    http://solidsoftwaretools.com/gf/project/denovotools/

                    Note that it's a different package from the previously mentioned http://solidsoftwaretools.com/gf/project/denovo/

                    Comment


                    • #11
                      Originally posted by kopi-o View Post
                      In the past, I've managed to convert Velvet/Oases output to base space using this package:

                      http://solidsoftwaretools.com/gf/project/denovotools/

                      Note that it's a different package from the previously mentioned http://solidsoftwaretools.com/gf/project/denovo/
                      Thanks!

                      The pre/post-processor around Velvet/Oases worked nicely and followed by the 'denovoadp' I think I'm back to base space.

                      But from what I've gathered the default input to the post-processor is the afg-output from Velvet, can I apply the same on the "transcripts.fa" as given by Oases?

                      /Stefan

                      Comment


                      • #12
                        Thank kopi-o work for denovo2, I use solid_denovo_preprocessor.pl to complete Oases. but the result is transcripts.fa and can not use solid_denovo_postprocessor.pl to convert. I look at "java -cp $denovo2/utils/miniAssembler.jarcom.lifetech.miniAssembler.util.FormatsTranslator <conversion_type> <sequence_file> <out_converted_file>" in DeNovoAssemblyProtocol0060810.pdf at page 32. however, in my denovo2 folder, there is not miniAssembler.jarcom.lifetech.miniAssembler.util.FormatsTranslator but miniAssembler. it is not to convert.

                        I look forward to your letter.
                        Thank you

                        Comment


                        • #13
                          In terms of the practical memory limitations faced by Velvet, how big/complex a transcriptome can likely be assembled using Oases? I understand Curtain was brought out to get around the memory issues of Velvet itself, in order to assemble progressively larger genomes, but can it be applied here too?

                          Comment


                          • #14
                            question about Abyss

                            Hi

                            Question about velvette oases

                            1. Can i assemble illumina/ solexa paired end data that was prepaired prior to assembly using an inhouse script using velvette oases. Paired end information may be lost due to prepairing.. what is the best way to assemble such data denovo

                            2.Also what is the best way to get summary statistics from the run with N50 contig size, what % of reads were used in the assembly, number of contigs/transcripts etc etc...

                            3. How to decide on hash length and Kmer size.
                            I am new to Next gen seq assembly. So these many questions...
                            If you have commands that would do this , it will be very helpful.

                            Thanks

                            Andy
                            Last edited by neXtGen seq; 12-20-2010, 09:10 AM. Reason: wrong title

                            Comment


                            • #15
                              Hi

                              I'm having problems with Oases.

                              I am performing a de novo assembly of a plant transcriptome. There is no reference genome available.

                              I am getting lots of Ns throughout the transcriptome.

                              I'd like to know how and why Oases inserts Ns and possible ways to avoid them?

                              I'm also interested in estimating expression levels. I have aligned the reads to the transcripts using BWA and SAMtools to calculate the mean coverage of each transcript. I am not confident with this strategy due to the randomness of which BWA aligns reads whens there are two positions of equal alignment. Any suggestions of possible strategies? I've seen some tools which can estimate expression levels, but all require the reference genome.

                              Thanks for your help.

                              Comment

                              Working...
                              X