Originally posted by Ole
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by erhuangzi View PostI hadn't been able to get MSR-CA running, can you run it ?And I want to use this software,how can i use it ? steps? thanks
It could be useful to read the GAGE recipes too: http://gage.cbcb.umd.edu/recipes/msrca.html
Ole
Comment
-
cool poster quick question
Originally posted by ians View PostI thought I share with everyone our AGBT poster which outlines the success we had with consolidating multi-platform sequence to produce hybrid assemblies.
We outline our methods and conclusions to dealing with various types of genomes. Enjoy:
AGBT Poster
Comment
-
Originally posted by Godevil View PostI cannot see your document.
Our genome assembly is bad. I think that's because of low GC content, big genome size and high repetitiveness.
I'm now taking a training course in BGI in China. I hope I can get some useful information.
Soon, I'll have two more chances to assemble planarian (both sexual and asexual). Since then, we've uncovered some heavy adapter contamination in our LIMP libraries. After re-sequencing, we'll see if this makes any difference.
Planarian remains to be a very difficult genome to assemble, but we'll see if we can get any closer..
Comment
-
Originally posted by ians View PostI thought I share with everyone our AGBT poster which outlines the success we had with consolidating multi-platform sequence to produce hybrid assemblies.
We outline our methods and conclusions to dealing with various types of genomes. Enjoy:
AGBT Poster
In your poster, you'd found that for genomes >10Mb, it is better to pre-assemble the 454 reads, and then combine the 454 pre-assembled fragments with the Illumina reads for the final assembly.
A few questions on how to do this:
* What were the 454/Newbler pre-assembled fragments? The contigs produced by Newbler?
* How did the 454/Newbler pre-assembled fragments get included with the Illumina reads as input to SOAPdenovo for the final assembly? As extra, super-long "reads" in the input FASTQ/FASTA files?
* Were the 454/Newbler pre-assembled fragments used for contig _and_ scaffold assembly, or just one (just contig or just scaffold assembly)?
* Did you have to change the SOAPdenovo parameters in some way to account for the very low effective "coverage" (effectively coverage==1) from the 454/Newbler pre-assembled fragments?
Comment
-
Originally posted by d f View Post* What were the 454/Newbler pre-assembled fragments? The contigs produced by Newbler?
Originally posted by d f View Post* How did the 454/Newbler pre-assembled fragments get included with the Illumina reads as input to SOAPdenovo for the final assembly? As extra, super-long "reads" in the input FASTQ/FASTA files?
Originally posted by d f View Post* Were the 454/Newbler pre-assembled fragments used for contig _and_ scaffold assembly, or just one (just contig or just scaffold assembly)?
Originally posted by d f View Post* Did you have to change the SOAPdenovo parameters in some way to account for the very low effective "coverage" (effectively coverage==1) from the 454/Newbler pre-assembled fragments?
BTW, we will be hosting a webinar on de novo assembly soon. If you've read the poster, you'll be familar with a large part of the presentation, however, we will also be going over some specific library prep R&D we've done (e.g LIMP libraries), as well as some cool visualization and metrics we use.
Comment
-
Thanks for the info! I will give it a try.
I already created separate 454/Newbler and Illumina/SGA assemblies, and when I BLAT aligned the shorter Illumina/SGA scaffolds against the longer 454/Newbler scaffolds, I noticed many gaps in the 454/Newbler scaffolds that could be closed with the Illumina/SGA contigs. So I have been looking for an easy way to combine the information from 454 and Illumina into one assembly, rather than use one to correct the other.
I will try your method with the SGA assembler first since I have experience with it. If anyone is interested on ideas on how to implement this using SGA, check out the sga-users list:
d f
Comment
-
Hi Ians.
I don't understand this completely, for example I don't understand what you said here:
Originally posted by ians View PostWe used them as input reads. This serves as a sort of prescaffolding.
But if this is correct, then the reason you get a better assembly could be that 454 sequence some parts of the genome that Illumina doesn't and the other way around, so you get a more complete read/k-mer set and therefore can assemble the genome better. Could this be the case? Have you tried just using both Illumina and 454 reads in Newbler or Celera and comparing with your Newbler contigs+Illumina reads in SOAP approach?
Ole
Comment
-
Originally posted by Ole View PostSOAP does not know that the first k-mer on a read is actually connected to the last k-mer. If there's a repeat longer than the k-mer between the first and last, SOAP will not connect them. At least that is my impression. Please correct me if I'm wrong.
Originally posted by Ole View PostBut if this is correct, then the reason you get a better assembly could be that 454 sequence some parts of the genome that Illumina doesn't and the other way around, so you get a more complete read/k-mer set and therefore can assemble the genome better. Could this be the case? Have you tried just using both Illumina and 454 reads in Newbler or Celera and comparing with your Newbler contigs+Illumina reads in SOAP approach?
Comment
-
This is weirdly prescient as it is exactly what I am doing now with a Blastocladiella genome - Chytrid fungi.
We have had some success in following the assembly steps from the Fire Ant Genome.
This pre-assembles the Illumina data and then reads that in as pseudo-reads into the Newbler package with the 454 reads.
I followed a similar process, assembling our Illumina data (41083984 sequences) in Velvet. Breaking the contigs into 400bp with 200bp overlaps with EMBOSS splitter. Then using those pseudo-reads as data with newbler to assemble with the rest of the 454 data we have (3kb and 20kb PE libraries).
We took the decision to do it this way as we were short on RAM - 32GB max and could not assemble combined 454+Illumina in any package.
Recently we have bought a server with 512GB RAM and have been able to use Newbler 2.6 to assemble both datasets together.
Illumina pseudo-reads + 454 (3kb+20k) with Newbler 2.5: Scaffolds N50=298598 N=603: Contigs: N50=4182 N=13220
___Illumina raw reads + 454 (3kb+20k) with Newbler 2.6: Scaffolds N50=158032 N=777: Contigs: N50=3738 N=29613
We also tried using the CLC workbench program - although this was done in another lab and I don't know the exact settings...
___Illumina raw reads + 454 (3kb+20k) with ________CLC: Scaffolds N50=8049 N=11067: Contigs: N50=1483 N=13847
So we actually got better results with the first method! CLC seems particularly bad.
We do however have a large %N problem in the final scaffold with all the assemblies that include the 454 3kb library - having issues dealing with this to be honest as the number is anywhere from 16-25% Ns!
I am computing a few more assemblies, currently using MIRA3 to see what that can do and I might give a look into some of the suggested strategies from Nick Loman's blog, hereLast edited by guyleonard; 06-15-2012, 07:03 AM.
Comment
-
Originally posted by guyleonard View PostThis pre-assembles the Illumina data and then reads that in as pseudo-reads into the Newbler package with the 454 reads.
Originally posted by guyleonard View PostBreaking the contigs into 400bp with 200bp overlaps with EMBOSS splitter.
Originally posted by guyleonard View PostWe do however have a large %N problem in the final scaffold with all the assemblies that include the 454 3kb library - having issues dealing with this to be honest as the number is anywhere from 16-25% Ns!
Originally posted by guyleonard View PostI am computing a few more assemblies, currently using MIRA3 to see what that can do
Comment
-
Originally posted by ians View PostI'm curious, did you choose 400bp because you didn't sequence with FLX+? I would think that larger frags would be advantageous.
I can only find the results from another run I did with 500bp split with 200bp overlap at the moment and that resulted in an N50 of 285142 and 621 scaffolds. So, similar but 400bp seemed to be the best from what I can remember.
Originally posted by ians View PostYeah, in my experience this is pretty normal. Ultimately, large LIMPS are there for orientation. The huge gaps may need manual method to fill during genome finishing. As a cheap first step, you may look into reusing your paired end reads with IMAGE (Iterative Mapping and Assembly for Gap Elimination) to try to extend those contigs. The software is a little difficult to get moving, but i've had some decent results.
Comment
-
Hi !
That is a really interesting thread, I'm no longer working on genome assembly but it will not be ever the case!
guyleonard, you may be interested in GapCloser developped by SOAP denovo team (http://soap.genomics.org.cn/soapdenovo.html). It use paired end Illumina read to close gap in scaffolds. We used it on a genome that had 24%N and after 4 iterations of gapcloser we obtained 13% gaps.
I think that SSPACE is a scaffolder but not a "gap closer", but maybe I'm wrong!
Maria
Comment
-
This might be a bit of a large post and end up being quite complicated but I thought I would report a little back about my experience anyway.
We had three sets of data from two sequencing technologies. One Illulmina HiSeq Paired-End (reads 20541992, 20541992), one 454 3kb PE library (reads 550,181 + 549,498) and one 454 20kb library (reads 189,318, not paired).
The first set of tables describes a few statistics for the contigs of various programs and datasets.
Illumina Only
No code has to be inserted here.454 3kb Only
No code has to be inserted here.454 20kb Only
No code has to be inserted here.454 3kb and 20kb
No code has to be inserted here.Pseudo-reads + 454 3kb and 20kb
Pseudo-reads were created by taking, at the time, the best contig assembly of our Illumina reads - from Velvet - and passing them through the EMBOSS program 'splitter'. This was done numerous times with different lengths and overlaps, but the two shown (overlap 200bp) seemed to produce the best results. Why? No idea.
No code has to be inserted here.Raw Illumina, 454 3kb and 20kb
Newbler is currently running this dataset group...waiting.
No code has to be inserted here.Scaffolds
Illumina Only
No code has to be inserted here.454 3kb Only
No code has to be inserted here.454 20kb Only
No code has to be inserted here.454 3kb and 20kb Only
No code has to be inserted here.Pseudo-reads + 454 3kb and 20kb
No code has to be inserted here.Raw Illumina Only, 454 3kb and 20kb
No code has to be inserted here.Okay, well those tables took a while to build. So far though, either I am seriously getting MIRA/SOAP etc wrong (best results shown of a few settings variations) or newbler is very good at doing what it does and shredding pre-assembled ilumina contigs seems to help in scaffold generation...
MIRA Scaffolds are intentionally left blank as I cannot get BAMBUS to scaffold the contigs - it flakes out at the "grommit" stage with an incomprehensible error. SSPACE seems to scaffold them but with an N50 of about 1300 which useless compared to everything else, so I haven't bothered with the rest of the stats from that...
I am running another combined assembly in newbler at the moment but with a lot of tweaks to from the standard settings, building contigs/scaffolds as we speak...
I might give Ray or Celera a go and also I might try taking the best individual assemblies and merging them with MINIMUS to see what that returns...Last edited by guyleonard; 07-10-2012, 06:37 AM.
Comment
Latest Articles
Collapse
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
-
by seqadmin
The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...-
Channel: Articles
08-27-2024, 04:44 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 06:25 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
Today, 06:25 AM
|
||
Started by seqadmin, Yesterday, 01:02 PM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 01:02 PM
|
||
Started by seqadmin, 09-18-2024, 06:39 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-18-2024, 06:39 AM
|
||
Started by seqadmin, 09-11-2024, 02:44 PM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-11-2024, 02:44 PM
|
Comment