Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 454 Sequencing Assembly/MIDs

    I'm fairly new to 454 sequencing and genome assembly, and I currently have six bacterial genomes that were sequenced whole shotgun sequencing and pair-end sequenced. They were done together with the barcodes (MIDs), I have assembled the scaffolds/sequences using Newbler/gsAssembler and each one still has 200+ contigs, I was told that one way to help fill in the gaps is [I]in silico[I] using different assembler software. I have been trying to use Mosaik to assemble the genomes, however I cannot figure out how to filter for the MIDs. So, I have two questions: 1) Is there assembler software that is "better" then the Roche Newbler/gsAssembler software, if so where can I obtain the software? 2) How do you parse out the barcode sequence for MIDs when using software other then the Roche Newbler/gsAssembler like Mosaik so that you can assemble using that software?

    Any help that you could provide would be very much appreciated, as I mentioned I'm fairly new to the genome sequencing world.

  • #2
    Originally posted by clostridium40 View Post
    I'm fairly new to 454 sequencing and genome assembly, and I currently have six bacterial genomes that were sequenced whole shotgun sequencing and pair-end sequenced. They were done together with the barcodes (MIDs), I have assembled the scaffolds/sequences using Newbler/gsAssembler and each one still has 200+ contigs, I was told that one way to help fill in the gaps is [I]in silico[I] using different assembler software. I have been trying to use Mosaik to assemble the genomes, however I cannot figure out how to filter for the MIDs. So, I have two questions: 1) Is there assembler software that is "better" then the Roche Newbler/gsAssembler software, if so where can I obtain the software? 2) How do you parse out the barcode sequence for MIDs when using software other then the Roche Newbler/gsAssembler like Mosaik so that you can assemble using that software?

    Any help that you could provide would be very much appreciated, as I mentioned I'm fairly new to the genome sequencing world.
    Before assembly you should separate your data by MID. This is usually done with Roche's SFF Tools (which should be freely available upon request).
    With the resulting SFF you can feed Newbler which makes (usually) a pretty good job on 454 data, at least when you are working denovo.
    Here you will (optionally) get a complete consed folder for further editing purposes, if needed.

    Mosaik is a reference-guided aligner, no denovo assembly.

    You might want to try MIRA3 for denovo assembly, it makes a pretty good job as well.
    MIRA creates CAF files, which leads you on the "Staden Package" trail (for editing purposes)..

    hth, Sven

    Comment


    • #3
      Thanks sklages,

      I really appreciate the advise. I will get the SFF Tools and will look into the MIRA3 assembler. Thanks again

      Comment


      • #4
        If you still need to remove the MIDs from your data, you can do it using SFF tools. It's kind of cumbersome, but it can be done. First, you need to separate the reads by MID using SFF tools. Then, use the -t trimfile option to remove the MID. This is the cumbersome part. You need to create a file listing every read in the file and the trim points. Here's how you do it:
        1. Create individual .sff files using your MID list
        2. Use sffinfo to create tab-delimited text files from each .sff file (sffinfo -s -t [sff file name] > [output file name].txt)
        3. Open the text file in Excel and use the LEN command to get the length of each sequence
        4. Use that information to create a file with three columns: read name, 5' trim position (length of your MID +4), and length of the read (from step 3). Save as tab delimited text. (Excel may insert quotes around your read names. If so, use a text editor to delete them all.)
        5. Now, use sfffile with the -t option and your newly created file as the trimfile.

        Comment


        • #5
          Originally posted by ajthomas View Post
          If you still need to remove the MIDs from your data, you can do it using SFF tools. It's kind of cumbersome, but it can be done. First, you need to separate the reads by MID using SFF tools. Then, use the -t trimfile option to remove the MID. This is the cumbersome part. You need to create a file listing every read in the file and the trim points. Here's how you do it:
          1. Create individual .sff files using your MID list
          2. Use sffinfo to create tab-delimited text files from each .sff file (sffinfo -s -t [sff file name] > [output file name].txt)
          3. Open the text file in Excel and use the LEN command to get the length of each sequence
          4. Use that information to create a file with three columns: read name, 5' trim position (length of your MID +4), and length of the read (from step 3). Save as tab delimited text. (Excel may insert quotes around your read names. If so, use a text editor to delete them all.)
          5. Now, use sfffile with the -t option and your newly created file as the trimfile.
          Why? sfffile is doing everything for you. No need to work with -t.
          The newly created SFF files contain the already shifted 5'-offset. If you use sffinfo to extract the sequences, the MIDs are not present anymore ...

          Sven

          Comment


          • #6
            Originally posted by sklages View Post
            Why? sfffile is doing everything for you. No need to work with -t.
            The newly created SFF files contain the already shifted 5'-offset. If you use sffinfo to extract the sequences, the MIDs are not present anymore ...

            Sven
            Well, I didn't know that. Thanks for telling me. I've used sfffile to separate by MID once or twice and I've used it to trim bases, but never on the same data set. I wasn't aware that it removed the MIDs.

            You learn something every day.

            Comment


            • #7
              I have a question about the MID separation. Would you tell me how to use sfffile to set up my modified MIDconfig.parse file to be used by the assembly program? For the life of me I cant figure it out.
              Thanks in advance-- Please help

              Comment


              • #8
                Originally posted by clackbeatr View Post
                I have a question about the MID separation. Would you tell me how to use sfffile to set up my modified MIDconfig.parse file to be used by the assembly program?
                Just typing 'sfffile' gives you a lot of usage information, e.g.
                Code:
                -mcf filename    Use this MID configuration file for multiplex info
                ... assuming you already have a modified parse file.

                If not, create one by having a look at the examples in
                Code:
                ROCHE_SW_INSTALLDIR/apps/gsSeqTools/config/MIDConfig.parse
                What do mean by "modified MIDconfig.parse file to be used by the assembly program"? Which assembly program is to be used? gsAssembler?

                If you are asking how to provide an modified parse file to your assembly program (gsAssembler/gsMapper), then it is simple (assuming you are using the GUI):
                - create your project
                - add GS Reads, in the selection dialog tick 'Use multiplex filtering'
                - then either provide your "MID Config File" or use "Custom Multiplexing"

                Hope i did get you right ;-)

                Sven

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-27-2024, 06:37 PM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-27-2024, 06:07 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                69 views
                0 likes
                Last Post seqadmin  
                Working...
                X