CGAL for genome assembly comparison
Hi,
I'm quite new to bioinformatics, so please excuse the simplicity of this post.
I've sequenced a fungal genome (<40 Mbp) using the Ion Torrent platform. I created a fragment library, which means that I should now have single end reads (right?)
MIRA seemed like a good choice of assembler so I used that as well as CLC to assemble the reads into contigs, but now I'm stuck. I'd like to compare the qualities of the MIRA and CLC assemblies using CGAL, but I have no idea how to use the program.
I've read the CGAL paper, but I'm not sure where to begin running this program on the cluster at my school and I can't find much info on this program anywhere else. Does anyone have any experience/suggestions as to how I should proceed?
Thanks in advance!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. Here we sequenced the genome of an individual with both technologies to a high average coverage of ∼76×, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ∼3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms.
Leave a comment:
-
A non-computational technique that you might want to look at is optical mapping.
I heard a presentation by Opgen and it looked useful.
Leave a comment:
-
Let me suggest something simple. If there is a genome of a related species (there should some something out there that is close to whatever you have sequenced) available you could compare your "genome" to the those.
Something like "mauve" (http://gel.ahabs.wisc.edu/mauve/) would be a simple start if there is a closely related genus/species available at NCBI http://www.ncbi.nlm.nih.gov/genome/browse/.
Leave a comment:
-
Originally posted by krobison View PostThe suggestions for Dr. McNelson are good as well; coverage excesses could indicate collapsed direct repeats which cannot be resolved with the sequence technology you used.
Leave a comment:
-
There are a number of programs which have been published which assess assemblies given the read data; trying them out is on my to-do list so I can't make a specific recommendation
ALE: Assembly Likelihood Evaluator
CGAL: Computing Genome Assembly Likelihoods
QUAST: Quality Assessment Tool for Genome Assemblies
REAPR
(not claiming this is the full list)
Plantagora
LAP
Mauve
AMOSvalidate
The suggestions for Dr. McNelson are good as well; coverage excesses could indicate collapsed direct repeats which cannot be resolved with the sequence technology you used.
You should also consider reading the GAGE, Assemblathon 1 & Assemblathon 2 papers, which evaluated a number of assembly programs and can illustrate some of the errors for which to watch.
Leave a comment:
-
You might want to ask your supervisor to clarify, but he might mean that you map all of your read data back to your closed/circularized genome and see if you have any possible mapping issues (areas of low/no coverage, areas where paired reads lose their mates, etc.)
Only other option might be to call ORFs and then annotate and see if you're missing any conserved genes that might suggest assembly issues or if you have multiple copies of confirmed single copy genes.
P.S. Your post is in the wrong sub-forum, this is for discussion surrounding the company Complete Genomics, which has been taken over by BGI.
Leave a comment:
-
Complete genome validation
Hi guys,
I have a question regarding how to validate a completed bacterial genome. The sequencing technology used was the Illumina GAIIX, and the annotations were done in CLC bio.
I've recently finished the gap closing, and I've confirmed the alignment using CLCbio and ClustalOmega.
My supervisor insists that I validate the genome, but I have absolutely no clue how to do that. I've completely closed all the gaps (resulting in a final single fasta file output), and there are no longer any ambiguous nucleotides.
is there something I'm missing?
Thanks.
Latest Articles
Collapse
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:55 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:55 AM
|
||
Started by seqadmin, 05-30-2024, 03:16 PM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
05-30-2024, 03:16 PM
|
||
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability
by seqadmin
Started by seqadmin, 05-29-2024, 01:32 PM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
05-29-2024, 01:32 PM
|
||
Started by seqadmin, 05-24-2024, 07:15 AM
|
0 responses
215 views
0 likes
|
Last Post
by seqadmin
05-24-2024, 07:15 AM
|
Leave a comment: