Seqanswers Leaderboard Ad

**zhangju** · 05-12-2013, 03:09 PM

Hi Brady,

I closed two of small bacterial genomes about 2Mbp genome with 454 PE pyrosequencing reads with similar strategies ( without SIS, but with manual closure based on viewing short reads alignment of reads to scaffold generated from Newbler).

1. Newbler assembly with 454 sff files to generate 454 Scaffolds
2. GapFiller
3. Bowtie alignment of reads to gap filled scaffolds
4. Manual close some gaps based on alignment
5. Iterative 2-4 steps.

Your case is different somehow, but this strategy (plus your SIS) would definitely be an efficient one. I would like to try SIS in the future.

Justin

**bcress** · 05-12-2013, 03:39 PM

Thanks for your response, Justin. Can you tell me the read length and insert size of your library? I think the short read length and insert size of my Illumina library puts me at a disadvantage when compared to the longer reads and insert sizes from a typical 454 library that can span longer repetitive regions.

SIS appears to be a nice tool, but it looks like it threw away ~100kb from my de novo assembly. I haven't looked at the content, so I can't really comment on that aspect yet. I'd be interested to hear if any seasoned veterans have used SIS yet. It would be nice to know what pitfalls to avoid with SIS with respect to bacterial genome assemblies.

Brady

**zhangju** · 05-13-2013, 06:54 AM

454 read length and insert size

Originally posted by bcress View Post

Thanks for your response, Justin. Can you tell me the read length and insert size of your library? I think the short read length and insert size of my Illumina library puts me at a disadvantage when compared to the longer reads and insert sizes from a typical 454 library that can span longer repetitive regions.

SIS appears to be a nice tool, but it looks like it threw away ~100kb from my de novo assembly. I haven't looked at the content, so I can't really comment on that aspect yet. I'd be interested to hear if any seasoned veterans have used SIS yet. It would be nice to know what pitfalls to avoid with SIS with respect to bacterial genome assemblies.

Brady

My 454 reads are from 300 to 500 bps of length. The insert size I could not say for sure, but give your an estimate 7523.38/std 8572.91 from Newbler output.

For De novo assembly, Illumina paired end reads only usually won't give you nice long scaffolds, therefore large number of contigs.

Keep each other updated for any achievements with SIS tool. I would try it out with next genome.

**krobison** · 05-13-2013, 12:29 PM

You should map out the cost of any of these strategies and then compare it to running long libraries on the PacBio. For E.coli-sized genomes and the new single library pipelines (e.g. http://www.ncbi.nlm.nih.gov/pubmed/23644548 ) and the new RS II instrument, it should be just 2-3 SMRTcells per genome, or about $1K-1.5K/genome at typical core facility charges. As shown recently (http://arxiv.org/abs/1304.3752), for the majority of bacterial genomes this strategy will give you a single contig; only a few known bacterial repeats are too large to resolve.

Illumina paired ends will be cheaper, but you'll have many more contigs. In my experience, velvet is not as aggressive as other assemblers out there such as Ray or MIRA, and it would seem you can tolerate aggressive. You can get some large contigs this way, but you will have quite a few of them.

454 should do better than Illumina, but will be inferior to PacBio and cost more.

**bcress** · 05-16-2013, 10:02 AM

Just saw your post. Thanks for that. These are the kinds of insights that would have been useful before we started sequencing. I will try MIRA and Ray, but I think resequencing is out of the question at this point. In your experience, what benefits have you seen from closing bacterial genomes in terms of informational content (other than the satisfaction of knowing that the genome is tidy, of course)?

**krobison** · 05-17-2013, 05:24 AM

For my work, I don't actually require the genome to be closed. But I do need large gene clusters to each reside on a single contig, and it turns out that natural product gene clusters are extremely hard to assemble. So I still have a bit of a local view (I need specific regions fully assembled, not the whole bug), but a tough standard there.

Another assembler to look at is MaSuRCA, though in my initial test Ray beat it (see also the GAGE-B paper )

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Your perspectives on assembling bacterial genomes with one set of reads

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News