Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Metagenomics w/ 454 tips?

    Hi yall,

    Quick question for you. Does anyone have tips, tricks or recommendations for metagenomic assembly and binning programs? I'm working with two 200K read (100-350bp) datasets from microbial communities that are relatively simple (predicted to have fewer than 100 taxa, with a handful of dominant organisms). What are your favorites? Any pitfalls to avoid?

    Cheers,
    Lizzy

  • #2
    assembly for 454 data

    454 data is a mess, but its the only long read technology as of today.
    Before, you try assembly be strict on your front end cleaning of you data. You must screen your reads hardcore (if you barcoded any samples) use tag cleaner to remove tags. Also, a removal of Ns and low quality scores would be helpful. You could try a de noising program if it is amplicon but I have not tried it for metas.
    Once you have removed all the homopolymers etc.
    Then forge or mira would be good start for your assembly.
    What percentage of your reads are 100 bp?
    If 50% then try abyss or velvet.

    More details would be help?

    Comment


    • #3
      you can try QIIME to process the data. http://qiime.sourceforge.net/index.html

      Comment


      • #4
        Qiime!!! Is not for metas!!!

        Not for metas!

        Comment


        • #5
          Ah yes, its not 16s metagenomics. Definitely need another cup of coffee

          Comment


          • #6
            I have been following the literature and it seems a new metagenome binning or taxonomy program comes out every month. It would be nice to see a comparison.

            I have used MEGAN, I think that it is one of the more used tools. It parses BLASTx results using the NCBI taxonomy, SEED, and KEGG. The BLASTx search is computationally intensive - A 275 megabase Illumina data set took about 1600 hours of computer time on our local cluster.

            Comment


            • #7
              I would not try to assemble the data at all. 200k 454 reads seems very low to get any decent assembly even in very simple communities (or even in single genomes).

              200.000 reads x 250 bp read length = 50 Mb of sequence.

              50 Mb of sequence = 10x coverage of 1 genome.

              The easy way is to upload your data to the MG-RAST server (http://metagenomics.anl.gov/).

              It automatically annotates your sample to various databases and allows for comparison with a lot of public metagenomes.

              In addition to MG-RAST i've been using MEGAN and I very much like the reasoning behind the apporach. But if you do not have a reasonable computer cluster available it will take too long to BLASTX 200k reads against e.g. NCBI nr..

              rgds
              Mads

              Comment


              • #8
                Metagenomic binning?

                Originally posted by cliffbeall View Post
                I have been following the literature and it seems a new metagenome binning or taxonomy program comes out every month. It would be nice to see a comparison.

                I have used MEGAN, I think that it is one of the more used tools. It parses BLASTx results using the NCBI taxonomy, SEED, and KEGG. The BLASTx search is computationally intensive - A 275 megabase Illumina data set took about 1600 hours of computer time on our local cluster.
                Cliff, did you assemble the illumina data set with abyss or velvet first?
                BlastX has a hard time with 76 bp or 100 bp read lengths.
                Meta Velvet looks like a sexy new way to assemble short read meta data.
                MEGAN is a good one and is most used, it does have a HIGH false positive rate. For microbes, SOrt-items and various IMER binning programs are around. Provide is great for viral metas. However, many others. Would be interested in a program that can input both 454 and illumina data without the flowgrams from 454.
                The other way I have thought about it is assemble then find protein orfs, then use blastp to compare various binning programs.
                BlastX takes forever!!

                Comment


                • #9
                  Originally posted by raw937 View Post
                  Cliff, did you assemble the illumina data set with abyss or velvet first?
                  BlastX has a hard time with 76 bp or 100 bp read lengths.
                  Meta Velvet looks like a sexy new way to assemble short read meta data.
                  MEGAN is a good one and is most used, it does have a HIGH false positive rate. For microbes, SOrt-items and various IMER binning programs are around. Provide is great for viral metas. However, many others. Would be interested in a program that can input both 454 and illumina data without the flowgrams from 454.
                  The other way I have thought about it is assemble then find protein orfs, then use blastp to compare various binning programs.
                  BlastX takes forever!!
                  In the example I was quoting I didn't assemble first. I have done assembly with SOAP denovo but I didn't have enough coverage except for the most abundant sequences. Fortunately I get free time on the cluster (way to go, Ohio!).

                  Comment


                  • #10
                    To add a data point, I did a quick benchmark with USEARCH. In my hands it is about 10X faster than blastx for searching Illumina reads against nr.

                    The drawbacks are that it uses more memory than blast so I had to split the database, and the results are not directly importable into MEGAN, though that should be doable with some work.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Genetic Variation in Immunogenetics and Antibody Diversity
                      by seqadmin



                      The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                      11-06-2024, 07:24 PM
                    • seqadmin
                      Choosing Between NGS and qPCR
                      by seqadmin



                      Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                      10-18-2024, 07:11 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 11:09 AM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Today, 06:13 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 11-01-2024, 06:09 AM
                    0 responses
                    30 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-30-2024, 05:31 AM
                    0 responses
                    21 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X