Seqanswers Leaderboard Ad

**EricHaugen** · 07-25-2013, 10:48 PM

If you really need to do genome assembly, I'd look into what PacBio data would cost. That seems ideally suited to bacterial genome assembly but I have no experience with it yet, I've just been eagerly awaiting that technology.

My experience with assembling Illumina reads has not been very successful. Many assemblers are fine for contigs of a few kb to tens of kb, but nothing approaching full genome assembly automatically. This might be good enough to see if there are new genes not represented in the type strain though.

Illumina reads are great for just aligning to a reference and producing an updated consensus sequence, that you might use as a reference in another iteration. Discordantly-mapping read pairs will point to larger differences (or errors in the reference assembly).

**krobison** · 07-26-2013, 06:14 AM

On PacBio using HGAP, Celera assembler (or MIRA) and Quiver, you can probably get a very high quality sequence (quite likely a single contig, with probably <10 base substitutions or indels) from a single SMRT cell, or around $1000 USD. I'm not sure about access to providers Down Under -- it looks like Millennium Science is a commercial provider (expect to go up a bit on my price estimate if they are like most commercial shops). The long reads will give you a good ability to detect structural variations, especially changes in repeats.

My standard advice for these projects can be found at http://omicsomics.blogspot.com/2013/...-for-help.html . In particular, I'd advise you to think about what questions you are going to ask & how the continuity of the sequence might affect those.

Depending on the G+C content of your organism, HiSeq may generate quite good data, but it won't be able to span long repeats such as Insertion Sequences and ribosomal RNA genes. With a moderate G+C organism, with 100X coverage you may have some contigs well over 200Kb, but the N50 of the assembly will probably be more in the 20-50Kb range or so -- but that's a rough guess. Some of that also depends on which assembler you use.

**mcnelson.phd** · 07-26-2013, 08:25 AM

If your goal is to identify the genes on a presence/absence type basis, then the Illumina reads will be more than sufficient. The best route would be to map them against your reference, and use a variant caller to look for differences between each strain and your reference. You can also assemble them reads de novo and get a pretty good idea of what's present in your strains but not in the reference.

Where PacBio is best suited is if you need to know physical arrangement of the genes. The Illumina data may be able to tell you if there are any rearrangements in the genome that affect your genes of interest, but it's a lot harder than if you have the long reads that PacBio can give you.

Saying all of that, $700/strain seems like a pretty high price to me. I don't know what sort of facilities are available to you and what their over-head is, but you could easily make all of the libraries using Nextera XT (~$150/library) and sequence them on a MiSeq for ~$1000. That would give you 2x250bp reads which would give you better rearrangement information than the 2x100bp reads from the HiSeq, and if you're only doing 3 strains then you should be able to get ~2Gbp of data per sample with the MiSeq.

**coralnerd** · 07-28-2013, 10:13 PM

Hi guys,

Thanks for the helpful replies. Just to give you a better idea of what I'm hoping to achieve here are some more details of what I'm working with.

Ideally I'd be comparing a strain that can't degrade the plant derived compounds we're interested in to one that can, but unfortunately that isn't quite the case. The three strains that I want to sequence are all phenotypically similar in that they are all able to degrade one or both of the two plant toxins, but they do it to differing degrees. One strain is particularly bad at it and is barely able to degrade the second compound at all.

As a very simple first step to look for genomic differences between the strains I've tried producing restriction digest fingerprints, which show subtle, but noticeable differences.

Our thinking therefore is that the genes involved in this pathway are present in all of the strains, but that there might be SNPs or other small differences between them. Of course these differences might be located elsewhere like in regulatory genes etc. At this stage we have virtually no information at all about where these genes might be located in the genome or what homologous sequences we should be looking for.

So - whatever sequencing method we end up using needs to be able to produce data that allows us to resolve potentially small differences between the genomes. With my limited experience in genome assembly and analysis I don't know how feasible it would be to map short Illumina reads to the reference genome and use this to try to identify SNPs.

Based on what I've read and the replies I've recieved here so far it sounds like PacBio might be a good option. I'll happily jump on the latest technological bandwagon if it can produce the results we're after. I've contacted Millennium to see what they can do for us in terms of price.

**swbarnes2** · 07-31-2013, 03:16 PM

Illumina data is fine for SNPs. With enough coverage, you could likely de novo assemble any genes present in your samples not present in the reference.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 25 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Looking for advice on bacterial de novo genome sequencing

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News