Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dan
    wiki wiki
    • Jul 2008
    • 194

    Input files for the Celera Assembler?

    Hi,

    I have sequence reads as a fasta file with a fasta quality file. I am converting these into .frg input for the Celera Assembler (CA) using AMOS,

    Code:
    toAmos -s test.fasta -q test.qual -o test.afg
    and then:

    Code:
    amos2frg -i test.afg

    However, the above strategy does not recognize the 'mate pairs' in my data (paired end reads) that are 'linked' using the St Louis naming convention (basically read0012.b is pared with read0012.g, the .b and the .g denoting the forward and the backward read, respectively).

    On the AMOS mailing list I got a reply to a similar question from Sven Klages, which told me that CA is usually passed a separate "linkage information" file in addition to the .frg input created above. His suggestion was to create linkage data in 'Trace Archive' format*, and then convert that to input for CA using AMOS again.

    Has anyone tackled this problem before? Is there any code 'off the shelf' to create the mate pair data?


    * http://www.ncbi.nlm.nih.gov/Traces/t...#header-global


    Cheers,
    Dan.


    P.S. Any other good mailing lists for sequence assembly?

    If you know any, I'll add them here:

    Homepage: Dan Bolser
    MetaBase the database of biological databases.
  • new300
    Member
    • Mar 2008
    • 50

    #2
    Are you planning to use the Celera assembler with short reads? Because I don't think it would work very well.

    Comment

    • sklages
      Senior Member
      • May 2008
      • 628

      #3
      @Dan,

      my suggestion was to either use the bambus mate pair output or a TraceArchive formatted XML file for use with "toAmos", like

      $ toAmos -s x.fasta -q x.qual -m test.mates -o test.a.afg

      or

      $ toAmos -s x.fasta -q x.qual -x test.xml -o test.a.afg

      but, of course, you have to create these files before.

      It is probably the easiest way to write your own "your_data2TraceArchive.pl" and then convert it to whatever you want, e.g. CA format using "tracedb-to-frg.pl".

      @new300,

      Celera Assembler has been optimised for at least 454FLX length reads; I have used it for shorter GS20 as well, but it may fail. All these assemblies were hybrid assemblies, not 454-only ones.

      hth,
      Sven

      Comment

      • new300
        Member
        • Mar 2008
        • 50

        #4
        Originally posted by sklages View Post
        Celera Assembler has been optimised for at least 454FLX length reads; I have used it for shorter GS20 as well, but it may fail. All these assemblies were hybrid assemblies, not 454-only ones.
        That's interesting, I'm surprised it copes with the elevated indel/homopolymer run error rate.

        Comment

        • dan
          wiki wiki
          • Jul 2008
          • 194

          #5
          Originally posted by sklages View Post
          @Dan,

          my suggestion was to either use the bambus mate pair output or a TraceArchive formatted XML file for use with "toAmos", like

          $ toAmos -s x.fasta -q x.qual -m test.mates -o test.a.afg

          or

          $ toAmos -s x.fasta -q x.qual -x test.xml -o test.a.afg

          but, of course, you have to create these files before.

          It is probably the easiest way to write your own "your_data2TraceArchive.pl" and then convert it to whatever you want, e.g. CA format using "tracedb-to-frg.pl".
          Yeah, that's what I'm struggling with... Any reference for the bambus format on the off chance that its simpler? I can't help thinking there is a canned solution in the AMOS docs if only I could find it.
          Homepage: Dan Bolser
          MetaBase the database of biological databases.

          Comment

          • dan
            wiki wiki
            • Jul 2008
            • 194

            #6
            Seems the Bambus docs will help:

            Homepage: Dan Bolser
            MetaBase the database of biological databases.

            Comment

            • sklages
              Senior Member
              • May 2008
              • 628

              #7
              Yeah, that's what I wanted to send just about now


              Btw,. I meant Bambus *input* not output ...

              cheers,
              Sven

              Comment

              • dan
                wiki wiki
                • Jul 2008
                • 194

                #8
                Sadly it doesn't seem to be working...

                Code:
                toAmos -s x.fasta -q x.qual  -o a.afg
                Code:
                toAmos -s x.fasta -q x.qual -m test.mates -o b.afg

                Gives:

                Code:
                diff a.afg b.afg
                5c5
                < Thu Dec  4 12:24:40 2008
                ---
                > Thu Dec  4 12:27:07 2008
                Following the info on the link I set test.mates to the following:

                Code:
                pair	\W+\.b\.abi	\W+\.g\.abi
                Their is no error on either command.
                Homepage: Dan Bolser
                MetaBase the database of biological databases.

                Comment

                • dan
                  wiki wiki
                  • Jul 2008
                  • 194

                  #9
                  I should try before I reply... Setting test.mates to

                  Code:
                  pair	\.b\.abi	\.g\.abi
                  Seems to have had the desired effect!

                  Now I just need to rerun the assembly and check in hawkeye.
                  Homepage: Dan Bolser
                  MetaBase the database of biological databases.

                  Comment

                  • sklages
                    Senior Member
                    • May 2008
                    • 628

                    #10
                    Code:
                    pair    \W+\.b\.abi    \W+\.g\.abi
                    If this is a perl RE "\W+" means, "everything but chars, digits and _"; that is
                    probably not what you want.

                    Your sample name looks like this: 065I03X00001.b.abi

                    better this way (\t as separator):
                    Code:
                     pair  (.*)\.b\.abi$ (.*)\.g\.abi$
                    hth,
                    Sven

                    Comment

                    • dan
                      wiki wiki
                      • Jul 2008
                      • 194

                      #11
                      Originally posted by new300 View Post
                      Are you planning to use the Celera assembler with short reads? Because I don't think it would work very well.
                      Have a look here,

                      Discussion of any scientific study related to high content or next generation genomics. Whole genome association, metagenomics, digital gene expression, etc.
                      Homepage: Dan Bolser
                      MetaBase the database of biological databases.

                      Comment

                      • gengen
                        Junior Member
                        • May 2009
                        • 4

                        #12
                        How to create frg file from SRA data ?

                        I want to run Celera Assembler usign SRA 454 data, but SRA doesn't provide sff data. They only provide FASTA and FASTQ.
                        So, how should I do ? I could do this by coverting FASTQ to FASTA and quality file and then converting them to frg file, but this is so roundabout and also I must write some scripts. Let me know if you have any idea for this issue.

                        thanks in advance
                        Last edited by gengen; 06-20-2009, 08:24 PM. Reason: mistype

                        Comment

                        • novice2
                          Junior Member
                          • Jun 2009
                          • 2

                          #13
                          To all members

                          Dear All

                          My pcr assay has suddenly began producing these curious strutures-- I have attached an image to show this effect. Can anyone explain please what mat be going on and what I can do to get rid of it. Before this the pcr was working fine.
                          Attached Files

                          Comment

                          • novice2
                            Junior Member
                            • Jun 2009
                            • 2

                            #14
                            To all members

                            I have attached a further 2 images showing this effect.
                            Attached Files

                            Comment

                            • sklages
                              Senior Member
                              • May 2008
                              • 628

                              #15
                              @gengen,

                              Have a look at Celera Input Formatting

                              Sven
                              Last edited by sklages; 06-21-2009, 02:13 AM. Reason: multiple issues in one thread

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 10:09 AM
                              0 responses
                              9 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 08:59 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              24 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...