Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Input files for the Celera Assembler?

    Hi,

    I have sequence reads as a fasta file with a fasta quality file. I am converting these into .frg input for the Celera Assembler (CA) using AMOS,

    Code:
    toAmos -s test.fasta -q test.qual -o test.afg
    and then:

    Code:
    amos2frg -i test.afg

    However, the above strategy does not recognize the 'mate pairs' in my data (paired end reads) that are 'linked' using the St Louis naming convention (basically read0012.b is pared with read0012.g, the .b and the .g denoting the forward and the backward read, respectively).

    On the AMOS mailing list I got a reply to a similar question from Sven Klages, which told me that CA is usually passed a separate "linkage information" file in addition to the .frg input created above. His suggestion was to create linkage data in 'Trace Archive' format*, and then convert that to input for CA using AMOS again.

    Has anyone tackled this problem before? Is there any code 'off the shelf' to create the mate pair data?


    * http://www.ncbi.nlm.nih.gov/Traces/t...#header-global


    Cheers,
    Dan.


    P.S. Any other good mailing lists for sequence assembly?

    If you know any, I'll add them here:

    Homepage: Dan Bolser
    MetaBase the database of biological databases.

  • #2
    Are you planning to use the Celera assembler with short reads? Because I don't think it would work very well.

    Comment


    • #3
      @Dan,

      my suggestion was to either use the bambus mate pair output or a TraceArchive formatted XML file for use with "toAmos", like

      $ toAmos -s x.fasta -q x.qual -m test.mates -o test.a.afg

      or

      $ toAmos -s x.fasta -q x.qual -x test.xml -o test.a.afg

      but, of course, you have to create these files before.

      It is probably the easiest way to write your own "your_data2TraceArchive.pl" and then convert it to whatever you want, e.g. CA format using "tracedb-to-frg.pl".

      @new300,

      Celera Assembler has been optimised for at least 454FLX length reads; I have used it for shorter GS20 as well, but it may fail. All these assemblies were hybrid assemblies, not 454-only ones.

      hth,
      Sven

      Comment


      • #4
        Originally posted by sklages View Post
        Celera Assembler has been optimised for at least 454FLX length reads; I have used it for shorter GS20 as well, but it may fail. All these assemblies were hybrid assemblies, not 454-only ones.
        That's interesting, I'm surprised it copes with the elevated indel/homopolymer run error rate.

        Comment


        • #5
          Originally posted by sklages View Post
          @Dan,

          my suggestion was to either use the bambus mate pair output or a TraceArchive formatted XML file for use with "toAmos", like

          $ toAmos -s x.fasta -q x.qual -m test.mates -o test.a.afg

          or

          $ toAmos -s x.fasta -q x.qual -x test.xml -o test.a.afg

          but, of course, you have to create these files before.

          It is probably the easiest way to write your own "your_data2TraceArchive.pl" and then convert it to whatever you want, e.g. CA format using "tracedb-to-frg.pl".
          Yeah, that's what I'm struggling with... Any reference for the bambus format on the off chance that its simpler? I can't help thinking there is a canned solution in the AMOS docs if only I could find it.
          Homepage: Dan Bolser
          MetaBase the database of biological databases.

          Comment


          • #6
            Seems the Bambus docs will help:

            Homepage: Dan Bolser
            MetaBase the database of biological databases.

            Comment


            • #7
              Yeah, that's what I wanted to send just about now


              Btw,. I meant Bambus *input* not output ...

              cheers,
              Sven

              Comment


              • #8
                Sadly it doesn't seem to be working...

                Code:
                toAmos -s x.fasta -q x.qual  -o a.afg
                Code:
                toAmos -s x.fasta -q x.qual -m test.mates -o b.afg

                Gives:

                Code:
                diff a.afg b.afg
                5c5
                < Thu Dec  4 12:24:40 2008
                ---
                > Thu Dec  4 12:27:07 2008
                Following the info on the link I set test.mates to the following:

                Code:
                pair	\W+\.b\.abi	\W+\.g\.abi
                Their is no error on either command.
                Homepage: Dan Bolser
                MetaBase the database of biological databases.

                Comment


                • #9
                  I should try before I reply... Setting test.mates to

                  Code:
                  pair	\.b\.abi	\.g\.abi
                  Seems to have had the desired effect!

                  Now I just need to rerun the assembly and check in hawkeye.
                  Homepage: Dan Bolser
                  MetaBase the database of biological databases.

                  Comment


                  • #10
                    Code:
                    pair    \W+\.b\.abi    \W+\.g\.abi
                    If this is a perl RE "\W+" means, "everything but chars, digits and _"; that is
                    probably not what you want.

                    Your sample name looks like this: 065I03X00001.b.abi

                    better this way (\t as separator):
                    Code:
                     pair  (.*)\.b\.abi$ (.*)\.g\.abi$
                    hth,
                    Sven

                    Comment


                    • #11
                      Originally posted by new300 View Post
                      Are you planning to use the Celera assembler with short reads? Because I don't think it would work very well.
                      Have a look here,

                      Discussion of any scientific study related to high content or next generation genomics. Whole genome association, metagenomics, digital gene expression, etc.
                      Homepage: Dan Bolser
                      MetaBase the database of biological databases.

                      Comment


                      • #12
                        How to create frg file from SRA data ?

                        I want to run Celera Assembler usign SRA 454 data, but SRA doesn't provide sff data. They only provide FASTA and FASTQ.
                        So, how should I do ? I could do this by coverting FASTQ to FASTA and quality file and then converting them to frg file, but this is so roundabout and also I must write some scripts. Let me know if you have any idea for this issue.

                        thanks in advance
                        Last edited by gengen; 06-20-2009, 08:24 PM. Reason: mistype

                        Comment


                        • #13
                          To all members

                          Dear All

                          My pcr assay has suddenly began producing these curious strutures-- I have attached an image to show this effect. Can anyone explain please what mat be going on and what I can do to get rid of it. Before this the pcr was working fine.
                          Attached Files

                          Comment


                          • #14
                            To all members

                            I have attached a further 2 images showing this effect.
                            Attached Files

                            Comment


                            • #15
                              @gengen,

                              Have a look at Celera Input Formatting

                              Sven
                              Last edited by sklages; 06-21-2009, 02:13 AM. Reason: multiple issues in one thread

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Recent Developments in Metagenomics
                                by seqadmin





                                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                09-23-2024, 06:35 AM
                              • seqadmin
                                Understanding Genetic Influence on Infectious Disease
                                by seqadmin




                                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                                09-09-2024, 10:59 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 04:51 AM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-01-2024, 07:10 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-30-2024, 08:33 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-26-2024, 12:57 PM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X