Unconfigured Ad

**maubp** · 09-19-2014, 05:27 AM

First ask the question why do you want/need a finished genome? How much time and money can you spend on getting one?

If you only care about one or two regions of interest, it may be cost effective to do it the old fashioned way (PCR and "Sanger" capillary sequencing to close a gaps).

**Tom_C** · 09-19-2014, 05:44 AM

Thanks for the reply!

I had assumed a closed, or mostly closed genome would make downstream applications much easier. We plan to do ChIP-Seq and possibly RNA-Seq with this bacterium later on, and figured having a mostly closed genome would be best.

That being said, if a closed genome is not required for these experiments we would still like to join as many contigs as possible to publish a decent draft genome. And that is where we need some expert advice.

**maubp** · 09-19-2014, 05:57 AM

A closed circle is of course nice, but if all you care about is gene content you may be fine as it is. Finishing it will cost time and money whichever route you take.

**JohnN** · 09-19-2014, 07:01 AM

There are several approaches of varying complexity and cost:

The easiest in my recent experience, is to get PacBio sequencing done. With the illumina reads mapped to a PacBio assembly, you can close and finish the genome in about 2 days solid work. But (and there are at least two big buts), it will cost you about $1500 for the sequencing, and the PacBio assembly process is not that easy or automated, so you may have to out-source that too. But it works, and we have done it for about 30 reference genomes needed for diagnostic purposes.

You can find a very closely related genome or two, and use synteny to help you arrange your contigs (mauve, MUMmer, or reference mapping would help here), and then you can PCR close the smaller PCRable gaps. The rRNA regions will be difficult, and you could either ignore them - because they are not really that important for many studies, or generate primer sets to stitch the rRNA reads together. I've done it, it's a pain, but that's what we did in the old days.

Or, as mentioned above, you can simply use your contig set in your downstream experiments. A large proportion of the genes involved with virulence, etc, are there already. The assembler typically quits when read length of the extending reads is less than the size of a repeated region. A quick way of assessing the quality of your assembly, is to auto-annotate the genome with something like 'prokka" and look at what you have. You could probably use gap5 to join a few contigs which have some overlap, and to fix the odd frameshift, but you likely have what you need to continue your studies.

**Brian Bushnell** · 09-19-2014, 11:00 AM

You already have a very good assembly, and closing the 28 remaining gaps probably won't effect many downstream programs. You will almost certainly need more data for a significant improvement - either a long-mate-pair library for better scaffolding, or PacBio for gap-filling. If you go PacBio, you may as well just run 2-3 SMRT cells and try for a complete single-contig PacBio-only assembly.

**bastianwur** · 09-22-2014, 02:01 AM

I'd try first to scaffold it according to a reference, and try to determine from that how much could be missing, and if this is relevant.
Because if e.g. 3/4 of the gaps possibly consist out of 23s or stretches of tRNA, then just go and ignore it.

If the missing parts seem to be more relevant, then there are a few things to consider:
- is repeat structure a problem (doesn't seem so)
- how much is missing? If it's a bigger size, then you might need to consider a second run with not so small coverage
- is the raw material still there? Because I think (not a lab person) that a PE jumping library (4 - 8 kb should get over the rRNAs; as suggested above) can be made from the same input material, so that would save time.

You should also do some QC on your genome. It can happen (had that with Ray, HGAP and with other assemblers as well) that parts can be duplicated, which might not be obvious at first. e.g. it turned out during some other processing of one of our genomes that it had the right size (5 MB), the right amount of proteins (5k), but not the right amount of "unique" proteins (4k). Why that? One of the scaffolds was just duplicated in the output.
Check as well that there's no obvious contamination in the assembly. It doesn't help you if a good part is e.coli (or whatever).

**Tom_C** · 09-23-2014, 08:34 AM

Thanks for the input everyone!

Unfortunately additional large scale sequencing is not in the budget for this project, so we will not be able to use mate-paired or PacBio reads to close the genome. The number of Illumi However we now know to use PacBio for all future genome projects.

Running the initial assembly through RAST indicates it is a fairly complete genome, with the correct number of proteins and a full compliment of rRNA's and tRNA's. At the suggestions of those in this thread, we plan to go ahead with ChIP and RNA-Seq using the current assembly.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Help towards closing a genome?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News