Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Input files for the Celera Assembler?

    Hi,

    I have sequence reads as a fasta file with a fasta quality file. I am converting these into .frg input for the Celera Assembler (CA) using AMOS,

    Code:
    toAmos -s test.fasta -q test.qual -o test.afg
    and then:

    Code:
    amos2frg -i test.afg

    However, the above strategy does not recognize the 'mate pairs' in my data (paired end reads) that are 'linked' using the St Louis naming convention (basically read0012.b is pared with read0012.g, the .b and the .g denoting the forward and the backward read, respectively).

    On the AMOS mailing list I got a reply to a similar question from Sven Klages, which told me that CA is usually passed a separate "linkage information" file in addition to the .frg input created above. His suggestion was to create linkage data in 'Trace Archive' format*, and then convert that to input for CA using AMOS again.

    Has anyone tackled this problem before? Is there any code 'off the shelf' to create the mate pair data?


    * http://www.ncbi.nlm.nih.gov/Traces/t...#header-global


    Cheers,
    Dan.


    P.S. Any other good mailing lists for sequence assembly?

    If you know any, I'll add them here:

    Homepage: Dan Bolser
    MetaBase the database of biological databases.

  • #2
    Are you planning to use the Celera assembler with short reads? Because I don't think it would work very well.

    Comment


    • #3
      @Dan,

      my suggestion was to either use the bambus mate pair output or a TraceArchive formatted XML file for use with "toAmos", like

      $ toAmos -s x.fasta -q x.qual -m test.mates -o test.a.afg

      or

      $ toAmos -s x.fasta -q x.qual -x test.xml -o test.a.afg

      but, of course, you have to create these files before.

      It is probably the easiest way to write your own "your_data2TraceArchive.pl" and then convert it to whatever you want, e.g. CA format using "tracedb-to-frg.pl".

      @new300,

      Celera Assembler has been optimised for at least 454FLX length reads; I have used it for shorter GS20 as well, but it may fail. All these assemblies were hybrid assemblies, not 454-only ones.

      hth,
      Sven

      Comment


      • #4
        Originally posted by sklages View Post
        Celera Assembler has been optimised for at least 454FLX length reads; I have used it for shorter GS20 as well, but it may fail. All these assemblies were hybrid assemblies, not 454-only ones.
        That's interesting, I'm surprised it copes with the elevated indel/homopolymer run error rate.

        Comment


        • #5
          Originally posted by sklages View Post
          @Dan,

          my suggestion was to either use the bambus mate pair output or a TraceArchive formatted XML file for use with "toAmos", like

          $ toAmos -s x.fasta -q x.qual -m test.mates -o test.a.afg

          or

          $ toAmos -s x.fasta -q x.qual -x test.xml -o test.a.afg

          but, of course, you have to create these files before.

          It is probably the easiest way to write your own "your_data2TraceArchive.pl" and then convert it to whatever you want, e.g. CA format using "tracedb-to-frg.pl".
          Yeah, that's what I'm struggling with... Any reference for the bambus format on the off chance that its simpler? I can't help thinking there is a canned solution in the AMOS docs if only I could find it.
          Homepage: Dan Bolser
          MetaBase the database of biological databases.

          Comment


          • #6
            Seems the Bambus docs will help:

            Homepage: Dan Bolser
            MetaBase the database of biological databases.

            Comment


            • #7
              Yeah, that's what I wanted to send just about now


              Btw,. I meant Bambus *input* not output ...

              cheers,
              Sven

              Comment


              • #8
                Sadly it doesn't seem to be working...

                Code:
                toAmos -s x.fasta -q x.qual  -o a.afg
                Code:
                toAmos -s x.fasta -q x.qual -m test.mates -o b.afg

                Gives:

                Code:
                diff a.afg b.afg
                5c5
                < Thu Dec  4 12:24:40 2008
                ---
                > Thu Dec  4 12:27:07 2008
                Following the info on the link I set test.mates to the following:

                Code:
                pair	\W+\.b\.abi	\W+\.g\.abi
                Their is no error on either command.
                Homepage: Dan Bolser
                MetaBase the database of biological databases.

                Comment


                • #9
                  I should try before I reply... Setting test.mates to

                  Code:
                  pair	\.b\.abi	\.g\.abi
                  Seems to have had the desired effect!

                  Now I just need to rerun the assembly and check in hawkeye.
                  Homepage: Dan Bolser
                  MetaBase the database of biological databases.

                  Comment


                  • #10
                    Code:
                    pair    \W+\.b\.abi    \W+\.g\.abi
                    If this is a perl RE "\W+" means, "everything but chars, digits and _"; that is
                    probably not what you want.

                    Your sample name looks like this: 065I03X00001.b.abi

                    better this way (\t as separator):
                    Code:
                     pair  (.*)\.b\.abi$ (.*)\.g\.abi$
                    hth,
                    Sven

                    Comment


                    • #11
                      Originally posted by new300 View Post
                      Are you planning to use the Celera assembler with short reads? Because I don't think it would work very well.
                      Have a look here,

                      Discussion of any scientific study related to high content or next generation genomics. Whole genome association, metagenomics, digital gene expression, etc.
                      Homepage: Dan Bolser
                      MetaBase the database of biological databases.

                      Comment


                      • #12
                        How to create frg file from SRA data ?

                        I want to run Celera Assembler usign SRA 454 data, but SRA doesn't provide sff data. They only provide FASTA and FASTQ.
                        So, how should I do ? I could do this by coverting FASTQ to FASTA and quality file and then converting them to frg file, but this is so roundabout and also I must write some scripts. Let me know if you have any idea for this issue.

                        thanks in advance
                        Last edited by gengen; 06-20-2009, 08:24 PM. Reason: mistype

                        Comment


                        • #13
                          To all members

                          Dear All

                          My pcr assay has suddenly began producing these curious strutures-- I have attached an image to show this effect. Can anyone explain please what mat be going on and what I can do to get rid of it. Before this the pcr was working fine.
                          Attached Files

                          Comment


                          • #14
                            To all members

                            I have attached a further 2 images showing this effect.
                            Attached Files

                            Comment


                            • #15
                              @gengen,

                              Have a look at Celera Input Formatting

                              Sven
                              Last edited by sklages; 06-21-2009, 02:13 AM. Reason: multiple issues in one thread

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Exploring the Dynamics of the Tumor Microenvironment
                                by seqadmin




                                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                07-08-2024, 03:19 PM
                              • seqadmin
                                Exploring Human Diversity Through Large-Scale Omics
                                by seqadmin


                                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                06-25-2024, 06:43 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 05:49 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-15-2024, 06:53 AM
                              0 responses
                              26 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-10-2024, 07:30 AM
                              0 responses
                              37 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-03-2024, 09:45 AM
                              0 responses
                              204 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X