Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dan
    wiki wiki
    • Jul 2008
    • 194

    Input files for the Celera Assembler?

    Hi,

    I have sequence reads as a fasta file with a fasta quality file. I am converting these into .frg input for the Celera Assembler (CA) using AMOS,

    Code:
    toAmos -s test.fasta -q test.qual -o test.afg
    and then:

    Code:
    amos2frg -i test.afg

    However, the above strategy does not recognize the 'mate pairs' in my data (paired end reads) that are 'linked' using the St Louis naming convention (basically read0012.b is pared with read0012.g, the .b and the .g denoting the forward and the backward read, respectively).

    On the AMOS mailing list I got a reply to a similar question from Sven Klages, which told me that CA is usually passed a separate "linkage information" file in addition to the .frg input created above. His suggestion was to create linkage data in 'Trace Archive' format*, and then convert that to input for CA using AMOS again.

    Has anyone tackled this problem before? Is there any code 'off the shelf' to create the mate pair data?


    * http://www.ncbi.nlm.nih.gov/Traces/t...#header-global


    Cheers,
    Dan.


    P.S. Any other good mailing lists for sequence assembly?

    If you know any, I'll add them here:

    Homepage: Dan Bolser
    MetaBase the database of biological databases.
  • new300
    Member
    • Mar 2008
    • 50

    #2
    Are you planning to use the Celera assembler with short reads? Because I don't think it would work very well.

    Comment

    • sklages
      Senior Member
      • May 2008
      • 628

      #3
      @Dan,

      my suggestion was to either use the bambus mate pair output or a TraceArchive formatted XML file for use with "toAmos", like

      $ toAmos -s x.fasta -q x.qual -m test.mates -o test.a.afg

      or

      $ toAmos -s x.fasta -q x.qual -x test.xml -o test.a.afg

      but, of course, you have to create these files before.

      It is probably the easiest way to write your own "your_data2TraceArchive.pl" and then convert it to whatever you want, e.g. CA format using "tracedb-to-frg.pl".

      @new300,

      Celera Assembler has been optimised for at least 454FLX length reads; I have used it for shorter GS20 as well, but it may fail. All these assemblies were hybrid assemblies, not 454-only ones.

      hth,
      Sven

      Comment

      • new300
        Member
        • Mar 2008
        • 50

        #4
        Originally posted by sklages View Post
        Celera Assembler has been optimised for at least 454FLX length reads; I have used it for shorter GS20 as well, but it may fail. All these assemblies were hybrid assemblies, not 454-only ones.
        That's interesting, I'm surprised it copes with the elevated indel/homopolymer run error rate.

        Comment

        • dan
          wiki wiki
          • Jul 2008
          • 194

          #5
          Originally posted by sklages View Post
          @Dan,

          my suggestion was to either use the bambus mate pair output or a TraceArchive formatted XML file for use with "toAmos", like

          $ toAmos -s x.fasta -q x.qual -m test.mates -o test.a.afg

          or

          $ toAmos -s x.fasta -q x.qual -x test.xml -o test.a.afg

          but, of course, you have to create these files before.

          It is probably the easiest way to write your own "your_data2TraceArchive.pl" and then convert it to whatever you want, e.g. CA format using "tracedb-to-frg.pl".
          Yeah, that's what I'm struggling with... Any reference for the bambus format on the off chance that its simpler? I can't help thinking there is a canned solution in the AMOS docs if only I could find it.
          Homepage: Dan Bolser
          MetaBase the database of biological databases.

          Comment

          • dan
            wiki wiki
            • Jul 2008
            • 194

            #6
            Seems the Bambus docs will help:

            Homepage: Dan Bolser
            MetaBase the database of biological databases.

            Comment

            • sklages
              Senior Member
              • May 2008
              • 628

              #7
              Yeah, that's what I wanted to send just about now


              Btw,. I meant Bambus *input* not output ...

              cheers,
              Sven

              Comment

              • dan
                wiki wiki
                • Jul 2008
                • 194

                #8
                Sadly it doesn't seem to be working...

                Code:
                toAmos -s x.fasta -q x.qual  -o a.afg
                Code:
                toAmos -s x.fasta -q x.qual -m test.mates -o b.afg

                Gives:

                Code:
                diff a.afg b.afg
                5c5
                < Thu Dec  4 12:24:40 2008
                ---
                > Thu Dec  4 12:27:07 2008
                Following the info on the link I set test.mates to the following:

                Code:
                pair	\W+\.b\.abi	\W+\.g\.abi
                Their is no error on either command.
                Homepage: Dan Bolser
                MetaBase the database of biological databases.

                Comment

                • dan
                  wiki wiki
                  • Jul 2008
                  • 194

                  #9
                  I should try before I reply... Setting test.mates to

                  Code:
                  pair	\.b\.abi	\.g\.abi
                  Seems to have had the desired effect!

                  Now I just need to rerun the assembly and check in hawkeye.
                  Homepage: Dan Bolser
                  MetaBase the database of biological databases.

                  Comment

                  • sklages
                    Senior Member
                    • May 2008
                    • 628

                    #10
                    Code:
                    pair    \W+\.b\.abi    \W+\.g\.abi
                    If this is a perl RE "\W+" means, "everything but chars, digits and _"; that is
                    probably not what you want.

                    Your sample name looks like this: 065I03X00001.b.abi

                    better this way (\t as separator):
                    Code:
                     pair  (.*)\.b\.abi$ (.*)\.g\.abi$
                    hth,
                    Sven

                    Comment

                    • dan
                      wiki wiki
                      • Jul 2008
                      • 194

                      #11
                      Originally posted by new300 View Post
                      Are you planning to use the Celera assembler with short reads? Because I don't think it would work very well.
                      Have a look here,

                      Discussion of any scientific study related to high content or next generation genomics. Whole genome association, metagenomics, digital gene expression, etc.
                      Homepage: Dan Bolser
                      MetaBase the database of biological databases.

                      Comment

                      • gengen
                        Junior Member
                        • May 2009
                        • 4

                        #12
                        How to create frg file from SRA data ?

                        I want to run Celera Assembler usign SRA 454 data, but SRA doesn't provide sff data. They only provide FASTA and FASTQ.
                        So, how should I do ? I could do this by coverting FASTQ to FASTA and quality file and then converting them to frg file, but this is so roundabout and also I must write some scripts. Let me know if you have any idea for this issue.

                        thanks in advance
                        Last edited by gengen; 06-20-2009, 08:24 PM. Reason: mistype

                        Comment

                        • novice2
                          Junior Member
                          • Jun 2009
                          • 2

                          #13
                          To all members

                          Dear All

                          My pcr assay has suddenly began producing these curious strutures-- I have attached an image to show this effect. Can anyone explain please what mat be going on and what I can do to get rid of it. Before this the pcr was working fine.
                          Attached Files

                          Comment

                          • novice2
                            Junior Member
                            • Jun 2009
                            • 2

                            #14
                            To all members

                            I have attached a further 2 images showing this effect.
                            Attached Files

                            Comment

                            • sklages
                              Senior Member
                              • May 2008
                              • 628

                              #15
                              @gengen,

                              Have a look at Celera Input Formatting

                              Sven
                              Last edited by sklages; 06-21-2009, 02:13 AM. Reason: multiple issues in one thread

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 08:59 AM
                              0 responses
                              7 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...