Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 454 Sequencing Assembly/MIDs

    I'm fairly new to 454 sequencing and genome assembly, and I currently have six bacterial genomes that were sequenced whole shotgun sequencing and pair-end sequenced. They were done together with the barcodes (MIDs), I have assembled the scaffolds/sequences using Newbler/gsAssembler and each one still has 200+ contigs, I was told that one way to help fill in the gaps is [I]in silico[I] using different assembler software. I have been trying to use Mosaik to assemble the genomes, however I cannot figure out how to filter for the MIDs. So, I have two questions: 1) Is there assembler software that is "better" then the Roche Newbler/gsAssembler software, if so where can I obtain the software? 2) How do you parse out the barcode sequence for MIDs when using software other then the Roche Newbler/gsAssembler like Mosaik so that you can assemble using that software?

    Any help that you could provide would be very much appreciated, as I mentioned I'm fairly new to the genome sequencing world.

  • #2
    Originally posted by clostridium40 View Post
    I'm fairly new to 454 sequencing and genome assembly, and I currently have six bacterial genomes that were sequenced whole shotgun sequencing and pair-end sequenced. They were done together with the barcodes (MIDs), I have assembled the scaffolds/sequences using Newbler/gsAssembler and each one still has 200+ contigs, I was told that one way to help fill in the gaps is [I]in silico[I] using different assembler software. I have been trying to use Mosaik to assemble the genomes, however I cannot figure out how to filter for the MIDs. So, I have two questions: 1) Is there assembler software that is "better" then the Roche Newbler/gsAssembler software, if so where can I obtain the software? 2) How do you parse out the barcode sequence for MIDs when using software other then the Roche Newbler/gsAssembler like Mosaik so that you can assemble using that software?

    Any help that you could provide would be very much appreciated, as I mentioned I'm fairly new to the genome sequencing world.
    Before assembly you should separate your data by MID. This is usually done with Roche's SFF Tools (which should be freely available upon request).
    With the resulting SFF you can feed Newbler which makes (usually) a pretty good job on 454 data, at least when you are working denovo.
    Here you will (optionally) get a complete consed folder for further editing purposes, if needed.

    Mosaik is a reference-guided aligner, no denovo assembly.

    You might want to try MIRA3 for denovo assembly, it makes a pretty good job as well.
    MIRA creates CAF files, which leads you on the "Staden Package" trail (for editing purposes)..

    hth, Sven

    Comment


    • #3
      Thanks sklages,

      I really appreciate the advise. I will get the SFF Tools and will look into the MIRA3 assembler. Thanks again

      Comment


      • #4
        If you still need to remove the MIDs from your data, you can do it using SFF tools. It's kind of cumbersome, but it can be done. First, you need to separate the reads by MID using SFF tools. Then, use the -t trimfile option to remove the MID. This is the cumbersome part. You need to create a file listing every read in the file and the trim points. Here's how you do it:
        1. Create individual .sff files using your MID list
        2. Use sffinfo to create tab-delimited text files from each .sff file (sffinfo -s -t [sff file name] > [output file name].txt)
        3. Open the text file in Excel and use the LEN command to get the length of each sequence
        4. Use that information to create a file with three columns: read name, 5' trim position (length of your MID +4), and length of the read (from step 3). Save as tab delimited text. (Excel may insert quotes around your read names. If so, use a text editor to delete them all.)
        5. Now, use sfffile with the -t option and your newly created file as the trimfile.

        Comment


        • #5
          Originally posted by ajthomas View Post
          If you still need to remove the MIDs from your data, you can do it using SFF tools. It's kind of cumbersome, but it can be done. First, you need to separate the reads by MID using SFF tools. Then, use the -t trimfile option to remove the MID. This is the cumbersome part. You need to create a file listing every read in the file and the trim points. Here's how you do it:
          1. Create individual .sff files using your MID list
          2. Use sffinfo to create tab-delimited text files from each .sff file (sffinfo -s -t [sff file name] > [output file name].txt)
          3. Open the text file in Excel and use the LEN command to get the length of each sequence
          4. Use that information to create a file with three columns: read name, 5' trim position (length of your MID +4), and length of the read (from step 3). Save as tab delimited text. (Excel may insert quotes around your read names. If so, use a text editor to delete them all.)
          5. Now, use sfffile with the -t option and your newly created file as the trimfile.
          Why? sfffile is doing everything for you. No need to work with -t.
          The newly created SFF files contain the already shifted 5'-offset. If you use sffinfo to extract the sequences, the MIDs are not present anymore ...

          Sven

          Comment


          • #6
            Originally posted by sklages View Post
            Why? sfffile is doing everything for you. No need to work with -t.
            The newly created SFF files contain the already shifted 5'-offset. If you use sffinfo to extract the sequences, the MIDs are not present anymore ...

            Sven
            Well, I didn't know that. Thanks for telling me. I've used sfffile to separate by MID once or twice and I've used it to trim bases, but never on the same data set. I wasn't aware that it removed the MIDs.

            You learn something every day.

            Comment


            • #7
              I have a question about the MID separation. Would you tell me how to use sfffile to set up my modified MIDconfig.parse file to be used by the assembly program? For the life of me I cant figure it out.
              Thanks in advance-- Please help

              Comment


              • #8
                Originally posted by clackbeatr View Post
                I have a question about the MID separation. Would you tell me how to use sfffile to set up my modified MIDconfig.parse file to be used by the assembly program?
                Just typing 'sfffile' gives you a lot of usage information, e.g.
                Code:
                -mcf filename    Use this MID configuration file for multiplex info
                ... assuming you already have a modified parse file.

                If not, create one by having a look at the examples in
                Code:
                ROCHE_SW_INSTALLDIR/apps/gsSeqTools/config/MIDConfig.parse
                What do mean by "modified MIDconfig.parse file to be used by the assembly program"? Which assembly program is to be used? gsAssembler?

                If you are asking how to provide an modified parse file to your assembly program (gsAssembler/gsMapper), then it is simple (assuming you are using the GUI):
                - create your project
                - add GS Reads, in the selection dialog tick 'Use multiplex filtering'
                - then either provide your "MID Config File" or use "Custom Multiplexing"

                Hope i did get you right ;-)

                Sven

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Quality Control Essentials for Next-Generation Sequencing Workflows
                  by seqadmin




                  Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

                  Nucleic Acid Quality Control
                  Preparing for NGS starts with isolating the...
                  02-10-2025, 01:58 PM
                • seqadmin
                  An Introduction to the Technologies Transforming Precision Medicine
                  by seqadmin


                  In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
                  01-27-2025, 07:46 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 02-07-2025, 09:30 AM
                0 responses
                63 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-05-2025, 10:34 AM
                0 responses
                99 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-03-2025, 09:07 AM
                0 responses
                78 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 01-31-2025, 08:31 AM
                0 responses
                44 views
                0 likes
                Last Post seqadmin  
                Working...
                X