Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Raj
    Member
    • Jan 2009
    • 15

    Assembling .sff files from 454 and finishing

    Hi can anybody suggest good assembly programs, other than Newbler and MIRA, which can use .sff files directly as an input file, not fasta.

    Also, I have generated an .ace file from newbler which is not fully compatible with consed (I can open the file in consed but for some reason the contig number look different). Could anybody suggest good programs, which I can use to finish a 454 generated genome? something that will allow me to view the scaffolds and join or break where needed.
    I've tried consed and staden, any others would be greatly appreciated!!


    Thanks in advance!

    Raj
  • Raj
    Member
    • Jan 2009
    • 15

    #2
    ...I was informed yesterday that the new version of consed (v18) should now be fully compatible with 454 data.
    Also, with proposed release of Gap5, this too should also resolve the incompatibility issues, many programs seem to have when trying to finish 454 generated data.

    Using MIRA and Newbler, seem to be the best methods for assembling 454 data, so that the pair end data can be fully taken advantage of.

    Finishing is still the bottleneck for which, i hope the new versions of Consed and Gap can resolve...

    Comment

    • v_kisand
      Member
      • Jan 2009
      • 38

      #3
      yes, consed 18 is out for few weeks, you need update for phrap as well.
      I did not have any problems with installation (32-bit Fedora 10)

      anyway, it does not perform de novo assembly of 454 reads, right? however it reads Newbler .ssf files, and allows assemble 454 reads to the reference sequence.

      please correct me when I am wrong...

      Comment

      • sklages
        Senior Member
        • May 2008
        • 628

        #4
        .. and it can directly read newbler created ace files. So if you like newbler, no problem.
        Maybe it's a good starting point for finishing a (shotgun) project if there is no sanger
        backbone.

        A good alternative might be MIRA which writes a CAF file (which can be easily converted
        to gap4). But gap4 might slow down if you have a huge dataset ...

        For larger assemblies you might want to have a look at Celera Assembler which in our
        hands makes a good job with sanger/454(FLX) hybrid assemblies in the bacterial genome
        size range.

        Just my 2p,
        Sven

        Comment

        • mjleaks
          Junior Member
          • Jan 2009
          • 6

          #5
          assembly issues

          Has anyone assembled 454 data with consed package version 19? I'm having some issues with reading of the .sff files and wondering if anyone has completed an assembly of 454 data (not using Roche software produced .ace files). I'm using "add454Reads.perl reference.ace sff.fof reference.fa", where the fof specifies the location and sff files to assembly, but although the script runs, I get an error "doesn't existile /shared/BNFinal/mapping/consed/sff_dir/FPDLD6P02.sff", and the 454 reads are not brought into the assembly; it basically assembles with only the reference sequence. Someone mentioned needing to update phrap, which I will look into, but any other thoughts on this?
          Thanks,
          Liz

          Comment

          • sklages
            Senior Member
            • May 2008
            • 628

            #6
            Hi Liz,

            Originally posted by mjleaks View Post
            Has anyone assembled 454 data with consed package version 19? I'm having some issues with reading of the .sff files and wondering if anyone has completed an assembly of 454 data (not using Roche software produced .ace files). I'm using "add454Reads.perl reference.ace sff.fof reference.fa", where the fof specifies the location and sff files to assembly, but although the script runs, I get an error "doesn't existile /shared/BNFinal/mapping/consed/sff_dir/FPDLD6P02.sff", and the 454 reads are not brought into the assembly; it basically assembles with only the reference sequence. Someone mentioned needing to update phrap, which I will look into, but any other thoughts on this?
            Thanks,
            Liz
            Well. it seems that there is no /shared/BNFinal/mapping/consed/sff_dir/FPDLD6P02.sff .. have you checked the location of your SFF file(s)?

            You should update to the current version of phrap, as cross_macch is updated as well. Phrap is not involved in the task of aligning 454 reads against your refseq; cross_match is used for that.

            cheers,
            Sven

            Comment

            • mjleaks
              Junior Member
              • Jan 2009
              • 6

              #7
              hi Sven. Thanks for the post. I checked that a few times to make sure I'm not going crazy and yes the sff file is where I specified in the fof. Here are the steps I'm following. Any help much appreocated:

              1.Ran gsMapper (through UI) using the option to create a Complete consed folder

              2.Deleted the .consedrc file that Newbler created in edit_dir (per v19 instructions)

              3.Deleted the phd.ball link in edit_dir (per v19 instructions)

              4.Checked that the current version of sff2scf is the one to be used. Type "sff2scf -v". It gives "080714"

              5.Created an .ace file from appropriate fasta format reference sequence: fasta2Ace.perl reference.fa

              6.Created a sff.fof containing the name of the appropriate sff files - used a single .sff file. The sff.fof therefore contains only the name of the .sff file “ FMAAUWB12.sff “; no path etc.. The sff.fof file is - located in edit_dir and from here the FMAAUWB12.sff file is in ../sff_dir

              7.Add reads from edit_dir directory run: add454Reads.perl reference.ace sff.fof reference.fa

              8.Get:
              doesn't existile FMAAUWB12.sff
              0.0 minutes to until done with alignments
              now using alignments to add reads to ace file
              executing: /usr/local/genome/bin/consed -ace reference.ace -addReads alignmentFiles090603_134426.fof -chem 454
              -addReads will be run.
              no ~/.consedrc file so no user resources will be used--that's ok
              no ./.consedrc file so no project-specific resources--that's ok
              couldn't open readOrder.txt--that's ok
              50% done. 1 reads read so far...
              Now setting quality values
              opening ../phdball_dir/phd.ball.1
              read phd files in ../phdball_dir/phd.ball.1 found: 1 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 2 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 3 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 4 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 5 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 6 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 7 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 8 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 9 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 1000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 2000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 3000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 4000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 5000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 6000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 7000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 8000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 9000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 10,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 20,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 30,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 40,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 50,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 60,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 70,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 80,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 90,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 100,000 totals: used: 0 need: 1
              Number of phd blocks used from ../phdball_dir/phd.ball.1: 0
              exception thrown: RatReninRegion has no phd file

              ace file: RatReninRegion.ace
              Version 19.0 (090206)
              RatReninRegion has no phd file

              Version 19.0 (090206)
              ace file: RatReninRegion.ace
              Number of individual phd files read: 0
              Total reads in assembly: 1
              Finished setting quality values in 3 seconds
              total errors on consed startup: 1
              now saving assembly... 3
              writing ./RatReninRegion.ace.1
              See new ace file RatReninRegion.ace.1
              done 0
              0.0 minutes cross_match and fasta time
              0.1 minutes consed time
              0.1 minutes total time

              Again, any assistance much appreciated,
              Liz

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              19 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              29 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-26-2026, 10:12 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Working...