Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • flobpf
    Member
    • Apr 2010
    • 76

    Split mate pair 454 for newbler

    Hi,

    I have a bunch of 454 3kb mate pair files in SFF format which I'd like to use to get a Newbler assembly. However, I want Newbler to use only those reads which have a linker sequence. Is it possible to provide such parameters to Newbler?

    Thanks
  • flxlex
    Moderator
    • Nov 2008
    • 412

    #2
    The short answer is 'no'. Newbler will check each read for the presence of the linker, split the reads that have one, but it uses the non-linker containing reads as shotgun reads.

    One way to achieve what you want would be to do an regular newbler assembly, extract IDs of the reads containing the linker from the 454PairStatus.txt (only reads with linkers are mentioned here), put these IDs in a text file, and use the -fi option with this file to have newbler assemble only those reads.

    Comment

    • flobpf
      Member
      • Apr 2010
      • 76

      #3
      Hi Flxlex,

      Thanks for your response. That is what I'm attempting to do. However, my Newbler run takes a really long time (100 hours, 30gb and still only 4% complete!!). I have 7 plates of 3kb mate pairs.

      I was thinking of the following alternate approach. Would that work??
      1) Use sff_extract to identify "linkered" sequences
      2) Split them into .f and .r based on linkers and quality-clip sequences
      3) Generate FASTQ files from only the seq with .f and .r
      4) Convert FASTQ to FASTA
      4) Use FASTA as input to Newbler.

      Would be glad to know if that'd work. Also, is there a way to speed up my Newbler run? I'm using the steps mentioned in your post here and here:


      Thanks for your help!
      Last edited by flobpf; 04-28-2011, 07:00 AM. Reason: added another link.

      Comment

      • kmcarr
        Senior Member
        • May 2008
        • 1181

        #4
        Originally posted by flobpf View Post
        Hi Flxlex,

        Thanks for your response. That is what I'm attempting to do. However, my Newbler run takes a really long time (100 hours, 30gb and still only 4% complete!!). I have 7 plates of 3kb mate pairs.

        I was thinking of the following alternate approach. Would that work??
        1) Use sff_extract to identify "linkered" sequences
        2) Split them into .f and .r based on linkers and quality-clip sequences
        3) Generate FASTQ files from only the seq with .f and .r
        4) Convert FASTQ to FASTA
        4) Use FASTA as input to Newbler.

        Would be glad to know if that'd work. Also, is there a way to speed up my Newbler run? I'm using the steps mentioned in your post here and here:


        Thanks for your help!
        To me that method seems overly complicated and you would loose one of the advantages of Newbler, namely performing its alignments in "flow-space" vs. "base-space". The major problem now seems to be getting Newbler to perform the first assembly so that you can generate a list of reads with are truly paired to pass to a second Newbler assembly. I would suggest an alternate method of identifying the paired reads.

        1. Dump FASTA format sequence files from your SFF files using the Roche sffinfo tool.

        2. Using your favorite nucleotide pattern matching program (cross_match, SSAHA2, fuzznuc (EMBOSS)) search the FASTA files for reads containing the PE linker sequence.

        3. Save the list of accessions for reads with the PE linker to a text file.

        4. Use this text file with the -fi option as described above.

        This is really just a modification of the method you are currently trying but using, perhaps, a faster method of identifying the paired reads.

        I am a little surprised though by how long Newbler is taking and if a significant fraction of your reads are truly paired (i.e. you won't be eliminating the majority of your input reads) it may still stump Newbler.

        Comment

        • flobpf
          Member
          • Apr 2010
          • 76

          #5
          Thanks!

          Ah Kevin.

          Thanks. Thats actually way simpler. Will do it that way.

          Comment

          • flxlex
            Moderator
            • Nov 2008
            • 412

            #6
            Hi,

            If you have an assembly running, you will notice that after the phase where newbler reads all the sequence file, there is the 454ReadStatus.txt file. This file can be used even if the assembly is not yet finished to get to the reads with the linker: these will be marked _left and _right. Saves you from having to do the mapping of the linker yourself...

            About the assembly speed: have you tried using more cpus's (with the -cpu flag) and the -large option?

            Comment

            • flobpf
              Member
              • Apr 2010
              • 76

              #7
              Solved!

              Originally posted by flxlex View Post
              Hi,

              If you have an assembly running, you will notice that after the phase where newbler reads all the sequence file, there is the 454ReadStatus.txt file. This file can be used even if the assembly is not yet finished to get to the reads with the linker: these will be marked _left and _right. Saves you from having to do the mapping of the linker yourself...

              About the assembly speed: have you tried using more cpus's (with the -cpu flag) and the -large option?
              Hi Flxlex,

              Thanks for your response. I did provide it with the -cpu option and the -large option. That made all the difference and my assembly got over at a blazing speed.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              21 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              29 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-26-2026, 10:12 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Working...