Seqanswers Leaderboard Ad

**raw937** · 05-22-2011, 12:42 PM

assembly for 454 data

454 data is a mess, but its the only long read technology as of today.
Before, you try assembly be strict on your front end cleaning of you data. You must screen your reads hardcore (if you barcoded any samples) use tag cleaner to remove tags. Also, a removal of Ns and low quality scores would be helpful. You could try a de noising program if it is amplicon but I have not tried it for metas.
Once you have removed all the homopolymers etc.
Then forge or mira would be good start for your assembly.
What percentage of your reads are 100 bp?
If 50% then try abyss or velvet.

More details would be help?

**rwenang** · 05-22-2011, 06:26 PM

you can try QIIME to process the data. http://qiime.sourceforge.net/index.html

**raw937** · 05-22-2011, 06:29 PM

Qiime!!! Is not for metas!!!

Not for metas!

**rwenang** · 05-22-2011, 06:53 PM

Ah yes, its not 16s metagenomics. Definitely need another cup of coffee

**cliffbeall** · 05-26-2011, 09:55 AM

I have been following the literature and it seems a new metagenome binning or taxonomy program comes out every month. It would be nice to see a comparison.

I have used MEGAN, I think that it is one of the more used tools. It parses BLASTx results using the NCBI taxonomy, SEED, and KEGG. The BLASTx search is computationally intensive - A 275 megabase Illumina data set took about 1600 hours of computer time on our local cluster.

**MadsAlbertsen** · 05-26-2011, 11:01 AM

I would not try to assemble the data at all. 200k 454 reads seems very low to get any decent assembly even in very simple communities (or even in single genomes).

200.000 reads x 250 bp read length = 50 Mb of sequence.

50 Mb of sequence = 10x coverage of 1 genome.

The easy way is to upload your data to the MG-RAST server (http://metagenomics.anl.gov/).

It automatically annotates your sample to various databases and allows for comparison with a lot of public metagenomes.

In addition to MG-RAST i've been using MEGAN and I very much like the reasoning behind the apporach. But if you do not have a reasonable computer cluster available it will take too long to BLASTX 200k reads against e.g. NCBI nr..

rgds
Mads

**raw937** · 05-26-2011, 12:07 PM

Metagenomic binning?

Originally posted by cliffbeall View Post

I have been following the literature and it seems a new metagenome binning or taxonomy program comes out every month. It would be nice to see a comparison.

I have used MEGAN, I think that it is one of the more used tools. It parses BLASTx results using the NCBI taxonomy, SEED, and KEGG. The BLASTx search is computationally intensive - A 275 megabase Illumina data set took about 1600 hours of computer time on our local cluster.

Cliff, did you assemble the illumina data set with abyss or velvet first?
BlastX has a hard time with 76 bp or 100 bp read lengths.
Meta Velvet looks like a sexy new way to assemble short read meta data.
MEGAN is a good one and is most used, it does have a HIGH false positive rate. For microbes, SOrt-items and various IMER binning programs are around. Provide is great for viral metas. However, many others. Would be interested in a program that can input both 454 and illumina data without the flowgrams from 454.
The other way I have thought about it is assemble then find protein orfs, then use blastp to compare various binning programs.
BlastX takes forever!!

**cliffbeall** · 05-27-2011, 11:32 AM

Originally posted by raw937 View Post

Cliff, did you assemble the illumina data set with abyss or velvet first?
BlastX has a hard time with 76 bp or 100 bp read lengths.
Meta Velvet looks like a sexy new way to assemble short read meta data.
MEGAN is a good one and is most used, it does have a HIGH false positive rate. For microbes, SOrt-items and various IMER binning programs are around. Provide is great for viral metas. However, many others. Would be interested in a program that can input both 454 and illumina data without the flowgrams from 454.
The other way I have thought about it is assemble then find protein orfs, then use blastp to compare various binning programs.
BlastX takes forever!!

In the example I was quoting I didn't assemble first. I have done assembly with SOAP denovo but I didn't have enough coverage except for the most abundant sequences. Fortunately I get free time on the cluster (way to go, Ohio!).

**cliffbeall** · 06-02-2011, 07:17 AM

To add a data point, I did a quick benchmark with USEARCH. In my hands it is about 10X faster than blastx for searching Illumina reads against nr.

The drawbacks are that it uses more memory than blast so I had to split the database, and the results are not directly importable into MEGAN, though that should be doable with some work.

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Metagenomics w/ 454 tips?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News