I have two Solexa data sets. The length of Solexa data is 35 and 75 individually. The insert length is also different. How should I assemble them?
Seqanswers Leaderboard Ad
Collapse
X
-
Maybe there is some free or open source assembler which is suit for this task. I had tried the AllPaths, however, it came across fatal error at last. I would like to know if any other can do the same job!
Originally posted by Chien-Yuan Chen View PostIf you use CLC genome workbench, the software can manage this problem. But you should specify the insert length to prevent incorrect alignment.
Comment
-
-
Have you tried Maq map merge?
I am guessing you could make a map for the 35 and 75bp reads separately, then merge them. Or maybe try samtools merge? Align with BWA or other favorite aligner, then merge the sam/bam files?
Comment
-
-
I tried to assemble de novo. I think I would like to assemble them sperately with velvet or edena, then assemble the contigs with CAP3, Phrap?
Originally posted by caddymob View PostHave you tried Maq map merge?
I am guessing you could make a map for the 35 and 75bp reads separately, then merge them. Or maybe try samtools merge? Align with BWA or other favorite aligner, then merge the sam/bam files?
http://samtools.sourceforge.net/samtools.shtml
Comment
-
-
In an ideal world you'd have an assembler that just understands short-read data, mixed libraries with varying insert sizes, etc and just gives you the optimal answer. Some of the tools make a fair stab at this (eg velvet), but the system resources required can be HUGE.
Therefore a more pragmatic approach used by many is starting with some sort of basic "read extension" where you lose track of the individual fragments, but build up contig consensus sequences by identifying overlapping Kmers and no branch points - much like ssake fuzzypaths, etc.
From here you can then either take these contigs as-is or throw them into another assembly tool more appropriate for longer sequences to attempt to resolve further.
Finally, map your individual reads (both 75 and 35) back to your consensus sequences again to get a true assembly rather than just consensus sequences.
You could even iterate - finding reads that overlap contig ends uniquely to edit and extending the "reference", and remapping those that failed to map previously. This technique works in more "usual" cases too where the reference doesn't precisely match the organism you're mapping against it. Not pretty though.
Comment
-
-
Originally posted by anyone1985 View PostI have two Solexa data sets. The length of Solexa data is 35 and 75 individually. The insert length is also different. How should I assemble them?
MIRA will know how to treat Solexa data and handle many things almost automatically (like clipping) and even know of sequencing technology dependent errors (like the "GGC" problem in Solexa data).
However, I would try this only for organsism of bacterial size and on a machine with lots and lots of memory.
And you might want to try assembling the 75mers first: if you have an average coverage of >= 30x with the 75mers and the insert sizes of the 75mer library is larger than the one for the 36mer library, the 36mers probably won't improve the assembly.
PS: Disclaimer: I wrote MIRA and might not be objective
Comment
-
-
I'd have to say that velvet is still your best bet for de novo assembly. It can accept different read lengths with no problem, and you can feed it 2 different sets of paired reads, with 2 different insert sizes, "out of the box". However, you can also make a trivial change to the source code and recompile so that it accepts more than 2 sets of insert lengths.
Also note that when you tell velvet the insert length (" -ins_length 280 "), you need to use the entire length of the fragment, so in this case if you told it 280, that would correspond to two 40bp reads with a 200bp "insert".
Consult the velvet-users list for details on these two issues.
Comment
-
-
Originally posted by bioinfosm View Postany de novo assembly tools that can iteratively assemble reads instead of eating up a whole lot of RAM?
my limitation is less than 60Gb RAM for a 1GB+ organism, to be de novo assembled by 20x solexa coverage worth reads
But just to be sure I understood you right: you have ~550 million 36mers that you want to assemble de-novo? That's (in terms of reads) almost 15-20 times more reads than the Human Genome Project or Celera had ... and they had *large* computing clusters to tackle the problems.
Even memory optimised programs with very simple assembly logic would need to keep lots of data in memory to be even decently efficient ... and you would still be in for *a lot* of disk reads/writes which would probably mean it'd literally take ages to get the thing assembled.
Correct me if I'm wrong or if you found some program which performs such a wonder ... but I don't think this is possible with 60Gb RAM.
Regards,
B.
Comment
-
-
Well, parallel algorithms like ABySS could possibly work if you have enough machines in a cluster. It's far cheaper and easier to get lots of small machines than a few truely humungous ones. However I've no idea what the upper limit is on an abyss assembly.
However the iterative approach sounds more sensible. I'm not sure of any official programs that do a decent job of this yet, although lots have manually done similar things by successive rounds of mapping to close genomes, shredding of close genomic data, etc.
James
Comment
-
-
I am new to this as well and I am trying to set up an RNASeq pipeline for my lab. I've run into an issue though. I'm confused on why one would run Velvet and the on the resultant contigs run Phrap. Why not just head to phrap straight away? Any help would be appreciated.
Cheers,
Addison
Comment
-
-
Originally posted by cloughlab View PostI am new to this as well and I am trying to set up an RNASeq pipeline for my lab. I've run into an issue though. I'm confused on why one would run Velvet and the on the resultant contigs run Phrap. Why not just head to phrap straight away? Any help would be appreciated.
Cheers,
Addison
Comment
-
Latest Articles
Collapse
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-20-2025, 05:03 AM
|
0 responses
17 views
0 reactions
|
Last Post
by seqadmin
03-20-2025, 05:03 AM
|
||
Started by seqadmin, 03-19-2025, 07:27 AM
|
0 responses
18 views
0 reactions
|
Last Post
by seqadmin
03-19-2025, 07:27 AM
|
||
Started by seqadmin, 03-18-2025, 12:50 PM
|
0 responses
19 views
0 reactions
|
Last Post
by seqadmin
03-18-2025, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
185 views
0 reactions
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
Comment