Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Metagenomics w/ 454 tips?

    Hi yall,

    Quick question for you. Does anyone have tips, tricks or recommendations for metagenomic assembly and binning programs? I'm working with two 200K read (100-350bp) datasets from microbial communities that are relatively simple (predicted to have fewer than 100 taxa, with a handful of dominant organisms). What are your favorites? Any pitfalls to avoid?

    Cheers,
    Lizzy

  • #2
    assembly for 454 data

    454 data is a mess, but its the only long read technology as of today.
    Before, you try assembly be strict on your front end cleaning of you data. You must screen your reads hardcore (if you barcoded any samples) use tag cleaner to remove tags. Also, a removal of Ns and low quality scores would be helpful. You could try a de noising program if it is amplicon but I have not tried it for metas.
    Once you have removed all the homopolymers etc.
    Then forge or mira would be good start for your assembly.
    What percentage of your reads are 100 bp?
    If 50% then try abyss or velvet.

    More details would be help?

    Comment


    • #3
      you can try QIIME to process the data. http://qiime.sourceforge.net/index.html

      Comment


      • #4
        Qiime!!! Is not for metas!!!

        Not for metas!

        Comment


        • #5
          Ah yes, its not 16s metagenomics. Definitely need another cup of coffee

          Comment


          • #6
            I have been following the literature and it seems a new metagenome binning or taxonomy program comes out every month. It would be nice to see a comparison.

            I have used MEGAN, I think that it is one of the more used tools. It parses BLASTx results using the NCBI taxonomy, SEED, and KEGG. The BLASTx search is computationally intensive - A 275 megabase Illumina data set took about 1600 hours of computer time on our local cluster.

            Comment


            • #7
              I would not try to assemble the data at all. 200k 454 reads seems very low to get any decent assembly even in very simple communities (or even in single genomes).

              200.000 reads x 250 bp read length = 50 Mb of sequence.

              50 Mb of sequence = 10x coverage of 1 genome.

              The easy way is to upload your data to the MG-RAST server (http://metagenomics.anl.gov/).

              It automatically annotates your sample to various databases and allows for comparison with a lot of public metagenomes.

              In addition to MG-RAST i've been using MEGAN and I very much like the reasoning behind the apporach. But if you do not have a reasonable computer cluster available it will take too long to BLASTX 200k reads against e.g. NCBI nr..

              rgds
              Mads

              Comment


              • #8
                Metagenomic binning?

                Originally posted by cliffbeall View Post
                I have been following the literature and it seems a new metagenome binning or taxonomy program comes out every month. It would be nice to see a comparison.

                I have used MEGAN, I think that it is one of the more used tools. It parses BLASTx results using the NCBI taxonomy, SEED, and KEGG. The BLASTx search is computationally intensive - A 275 megabase Illumina data set took about 1600 hours of computer time on our local cluster.
                Cliff, did you assemble the illumina data set with abyss or velvet first?
                BlastX has a hard time with 76 bp or 100 bp read lengths.
                Meta Velvet looks like a sexy new way to assemble short read meta data.
                MEGAN is a good one and is most used, it does have a HIGH false positive rate. For microbes, SOrt-items and various IMER binning programs are around. Provide is great for viral metas. However, many others. Would be interested in a program that can input both 454 and illumina data without the flowgrams from 454.
                The other way I have thought about it is assemble then find protein orfs, then use blastp to compare various binning programs.
                BlastX takes forever!!

                Comment


                • #9
                  Originally posted by raw937 View Post
                  Cliff, did you assemble the illumina data set with abyss or velvet first?
                  BlastX has a hard time with 76 bp or 100 bp read lengths.
                  Meta Velvet looks like a sexy new way to assemble short read meta data.
                  MEGAN is a good one and is most used, it does have a HIGH false positive rate. For microbes, SOrt-items and various IMER binning programs are around. Provide is great for viral metas. However, many others. Would be interested in a program that can input both 454 and illumina data without the flowgrams from 454.
                  The other way I have thought about it is assemble then find protein orfs, then use blastp to compare various binning programs.
                  BlastX takes forever!!
                  In the example I was quoting I didn't assemble first. I have done assembly with SOAP denovo but I didn't have enough coverage except for the most abundant sequences. Fortunately I get free time on the cluster (way to go, Ohio!).

                  Comment


                  • #10
                    To add a data point, I did a quick benchmark with USEARCH. In my hands it is about 10X faster than blastx for searching Illumina reads against nr.

                    The drawbacks are that it uses more memory than blast so I had to split the database, and the results are not directly importable into MEGAN, though that should be doable with some work.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    31 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    33 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X