Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Input files for the Celera Assembler?

    Hi,

    I have sequence reads as a fasta file with a fasta quality file. I am converting these into .frg input for the Celera Assembler (CA) using AMOS,

    Code:
    toAmos -s test.fasta -q test.qual -o test.afg
    and then:

    Code:
    amos2frg -i test.afg

    However, the above strategy does not recognize the 'mate pairs' in my data (paired end reads) that are 'linked' using the St Louis naming convention (basically read0012.b is pared with read0012.g, the .b and the .g denoting the forward and the backward read, respectively).

    On the AMOS mailing list I got a reply to a similar question from Sven Klages, which told me that CA is usually passed a separate "linkage information" file in addition to the .frg input created above. His suggestion was to create linkage data in 'Trace Archive' format*, and then convert that to input for CA using AMOS again.

    Has anyone tackled this problem before? Is there any code 'off the shelf' to create the mate pair data?


    * http://www.ncbi.nlm.nih.gov/Traces/t...#header-global


    Cheers,
    Dan.


    P.S. Any other good mailing lists for sequence assembly?

    If you know any, I'll add them here:

    Homepage: Dan Bolser
    MetaBase the database of biological databases.

  • #2
    Are you planning to use the Celera assembler with short reads? Because I don't think it would work very well.

    Comment


    • #3
      @Dan,

      my suggestion was to either use the bambus mate pair output or a TraceArchive formatted XML file for use with "toAmos", like

      $ toAmos -s x.fasta -q x.qual -m test.mates -o test.a.afg

      or

      $ toAmos -s x.fasta -q x.qual -x test.xml -o test.a.afg

      but, of course, you have to create these files before.

      It is probably the easiest way to write your own "your_data2TraceArchive.pl" and then convert it to whatever you want, e.g. CA format using "tracedb-to-frg.pl".

      @new300,

      Celera Assembler has been optimised for at least 454FLX length reads; I have used it for shorter GS20 as well, but it may fail. All these assemblies were hybrid assemblies, not 454-only ones.

      hth,
      Sven

      Comment


      • #4
        Originally posted by sklages View Post
        Celera Assembler has been optimised for at least 454FLX length reads; I have used it for shorter GS20 as well, but it may fail. All these assemblies were hybrid assemblies, not 454-only ones.
        That's interesting, I'm surprised it copes with the elevated indel/homopolymer run error rate.

        Comment


        • #5
          Originally posted by sklages View Post
          @Dan,

          my suggestion was to either use the bambus mate pair output or a TraceArchive formatted XML file for use with "toAmos", like

          $ toAmos -s x.fasta -q x.qual -m test.mates -o test.a.afg

          or

          $ toAmos -s x.fasta -q x.qual -x test.xml -o test.a.afg

          but, of course, you have to create these files before.

          It is probably the easiest way to write your own "your_data2TraceArchive.pl" and then convert it to whatever you want, e.g. CA format using "tracedb-to-frg.pl".
          Yeah, that's what I'm struggling with... Any reference for the bambus format on the off chance that its simpler? I can't help thinking there is a canned solution in the AMOS docs if only I could find it.
          Homepage: Dan Bolser
          MetaBase the database of biological databases.

          Comment


          • #6
            Seems the Bambus docs will help:

            Homepage: Dan Bolser
            MetaBase the database of biological databases.

            Comment


            • #7
              Yeah, that's what I wanted to send just about now


              Btw,. I meant Bambus *input* not output ...

              cheers,
              Sven

              Comment


              • #8
                Sadly it doesn't seem to be working...

                Code:
                toAmos -s x.fasta -q x.qual  -o a.afg
                Code:
                toAmos -s x.fasta -q x.qual -m test.mates -o b.afg

                Gives:

                Code:
                diff a.afg b.afg
                5c5
                < Thu Dec  4 12:24:40 2008
                ---
                > Thu Dec  4 12:27:07 2008
                Following the info on the link I set test.mates to the following:

                Code:
                pair	\W+\.b\.abi	\W+\.g\.abi
                Their is no error on either command.
                Homepage: Dan Bolser
                MetaBase the database of biological databases.

                Comment


                • #9
                  I should try before I reply... Setting test.mates to

                  Code:
                  pair	\.b\.abi	\.g\.abi
                  Seems to have had the desired effect!

                  Now I just need to rerun the assembly and check in hawkeye.
                  Homepage: Dan Bolser
                  MetaBase the database of biological databases.

                  Comment


                  • #10
                    Code:
                    pair    \W+\.b\.abi    \W+\.g\.abi
                    If this is a perl RE "\W+" means, "everything but chars, digits and _"; that is
                    probably not what you want.

                    Your sample name looks like this: 065I03X00001.b.abi

                    better this way (\t as separator):
                    Code:
                     pair  (.*)\.b\.abi$ (.*)\.g\.abi$
                    hth,
                    Sven

                    Comment


                    • #11
                      Originally posted by new300 View Post
                      Are you planning to use the Celera assembler with short reads? Because I don't think it would work very well.
                      Have a look here,

                      Discussion of any scientific study related to high content or next generation genomics. Whole genome association, metagenomics, digital gene expression, etc.
                      Homepage: Dan Bolser
                      MetaBase the database of biological databases.

                      Comment


                      • #12
                        How to create frg file from SRA data ?

                        I want to run Celera Assembler usign SRA 454 data, but SRA doesn't provide sff data. They only provide FASTA and FASTQ.
                        So, how should I do ? I could do this by coverting FASTQ to FASTA and quality file and then converting them to frg file, but this is so roundabout and also I must write some scripts. Let me know if you have any idea for this issue.

                        thanks in advance
                        Last edited by gengen; 06-20-2009, 08:24 PM. Reason: mistype

                        Comment


                        • #13
                          To all members

                          Dear All

                          My pcr assay has suddenly began producing these curious strutures-- I have attached an image to show this effect. Can anyone explain please what mat be going on and what I can do to get rid of it. Before this the pcr was working fine.
                          Attached Files

                          Comment


                          • #14
                            To all members

                            I have attached a further 2 images showing this effect.
                            Attached Files

                            Comment


                            • #15
                              @gengen,

                              Have a look at Celera Input Formatting

                              Sven
                              Last edited by sklages; 06-21-2009, 02:13 AM. Reason: multiple issues in one thread

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advanced Tools Transforming the Field of Cytogenomics
                                by seqadmin


                                At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
                                09-26-2023, 06:26 AM
                              • seqadmin
                                How RNA-Seq is Transforming Cancer Studies
                                by seqadmin



                                Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
                                09-07-2023, 11:15 PM
                              • seqadmin
                                Methods for Investigating the Transcriptome
                                by seqadmin




                                Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

                                Whole Transcriptome RNA-seq
                                Whole transcriptome sequencing...
                                08-31-2023, 11:07 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 09-27-2023, 06:57 AM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-26-2023, 07:53 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-25-2023, 07:42 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-22-2023, 09:05 AM
                              0 responses
                              45 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X