Pacbio scaffolding - SEQanswers

GenoMax replied

03-30-2015, 09:09 AM
@rhall: There is a new thread with some additional information. http://seqanswers.com/forums/showthread.php?t=51427
Leave a comment:
rhall replied

03-30-2015, 09:03 AM
What is the subread distribution (N50, mean). Why is the yield so low? Even at a conservative sequencing yield 10 SMRT cells should be ~800x for a 6Mb genome.
I suspect your assembly is limited by you library.
Leave a comment:
manjari.deshmukh replied

03-29-2015, 08:11 PM
Hi,
Yes, i should haave posted this in a new thread.
Anyways, i am working on Bacteria whose genome size is approx 6MB. The coverage provided by 10 SMRT cells is 64X. I am using HGAP 3 with mainly default parameters. only changing Genome size and fiddling with subread length.

Thanks and regards,

Manjari
Leave a comment:
gconcepcion replied

03-27-2015, 01:02 PM
Originally posted by manjari.deshmukh View Post

Hi all,
Need help. when doing pacbio assembly with SMRT 2.3.0 portal with 10 SMRT cells using HGAP got 245 contigs which is very high. I want to know how to reduce this number to 1 or 2.

With the the amount of information that you've provided so far, the best help I can give you is that, "You need to tweak some parameters."

If you would like help with an assembly, you would be better off posting a new thread(rather than continue a 2 year old stale thread) with much more information about what you've tried so far.

What is the organism?
What is the expected genome size?
Is it diploid or haploid?
Approximately how much coverage of the genome did 10 SMRTCells get you?
How was the library prepared?
What sequencing chemistry did you use?
Which protocols have you tried running so far?
With what parameters?
Leave a comment:
manjari.deshmukh replied

03-25-2015, 03:26 AM
Hi all,
Need help. when doing pacbio assembly with SMRT 2.3.0 portal with 10 SMRT cells using HGAP got 245 contigs which is very high. I want to know how to reduce this number to 1 or 2.
Leave a comment:
boetsie replied

02-19-2013, 01:45 AM
At the moment you can simply align the pacbio reads to the contigs with a tool like MUMmer, and either make scaffolds yourself or feed pairing information to SSPACE or Bambus.

Originally posted by AdrianP View Post

The pacbio that I have are filtered through a pipeline with illumina reads. Most pacbio reads were "junked", but the rest were corrected to be HQ reads, so it should be much better in error rate and so on.
Leave a comment:
AdrianP replied

02-18-2013, 07:55 AM
The pacbio that I have are filtered through a pipeline with illumina reads. Most pacbio reads were "junked", but the rest were corrected to be HQ reads, so it should be much better in error rate and so on.
Leave a comment:
boetsie replied

02-18-2013, 07:53 AM
Originally posted by AdrianP View Post

I am aware of SSPACE and I used it on a different genome project that is still in process and I like how it works. I was surprised that it accepts matepairs as input data but not pacbio reads. pacbio reads are similar in the sense that they are long range information but as opposed to mate pairs they have a definite length and would not necessarily fill the gap with NNNN.

Well, in general they are the same. But the type of data is rather different. I think you should be well aware of the fact that PacBio has a high error rate, which makes it difficult for the alignment process since it leads to false positive alignments. This can of course result into erroneous scaffolds.
In addition, since the alignment is based on the whole PacBio read, the pacbio read can contain multiple contigs on a single read, while the matepair spans at most two contigs. Because of this, the algorithm for SSPACE should be changed and that's why the addition of PacBio reads is not so simple as you think.

For now, you can ofcourse make 'fake' paired-reads of the pacbio reads and put these into SSPACE.

Regards,
Boetsie
Leave a comment:
AdrianP replied

02-18-2013, 05:21 AM
Originally posted by boetsie View Post

We at BaseClear (developers of SSPACE) have developed a modified version of SSPACE which accepts PacBio long reads. The method gives very nice results, but at this moment we offer this only as an internal service since the algorithm itself is still in a testing phase. Official release might follow later this year, but has not been decided yet. If you are interested in BaseClear's assembly-service please write to [email protected]

Kind Regards,
Boetsie

I am aware of SSPACE and I used it on a different genome project that is still in process and I like how it works. I was surprised that it accepts matepairs as input data but not pacbio reads. pacbio reads are similar in the sense that they are long range information but as opposed to mate pairs they have a definite length and would not necessarily fill the gap with NNNN.
Leave a comment:
boetsie replied

02-18-2013, 03:47 AM
Originally posted by AdrianP View Post

I am surprised to not be able to find any scaffolders for pacbio data. I am looking for something that SSPACE does (surprised to see that SSPACE doesn't accept pacbio as input). I have a bunch of contigs generated by velvet, and now I want to link these contigs by using LONG reads, which are pacbio corrected for error.

We at BaseClear (developers of SSPACE) have developed a modified version of SSPACE which accepts PacBio long reads. The method gives very nice results, but at this moment we offer this only as an internal service since the algorithm itself is still in a testing phase. Official release might follow later this year, but has not been decided yet. If you are interested in BaseClear's assembly-service please write to [email protected]

Kind Regards,
Boetsie
Leave a comment:
AdrianP replied

02-17-2013, 02:44 PM
Originally posted by jbingham View Post

In that case, you will need either the Amazon VM or the full install. Sorry!

I will try the Amazon VM, thank you very much for your help!
Leave a comment:
jbingham replied

02-17-2013, 02:43 PM
In that case, you will need either the Amazon VM or the full install. Sorry!
Leave a comment:
AdrianP replied

02-17-2013, 02:41 PM
Nah dude, it's that what I want:

AHA: a hybrid assembler to scaffold existing contigs and fill gaps. Available only in SMRT Analysis. Since v1.0

I want to link my contigs with long reads, that are sometimes even 1x in coverage.
Leave a comment:
jbingham replied

02-17-2013, 02:32 PM
Maybe the Amazon image is what you need. Nothing to install, just boot up a VM.

Agree that it's a big download to get everything. The aligner and variant caller (blasr and quiver) are what you requested: separate installs from GitHub. See pacbiodevnet.com for links on the Compatible Software page.
Leave a comment:
AdrianP replied

02-17-2013, 02:15 PM
Originally posted by jbingham View Post

PacBio's software is all open source, BSD license. See pacbiodevnet.com for downloads and links to GitHub projects.

Seems you are right. I gave it a shot, and oh my god... why in the world are they doing this. I mean seriously? In order to use one of their tools I need to download a 1 GB file and go through extensive installation instructions outlined here:

404 Not Found

http://pacb.com/devnet/files/software/smrtanalysis/1.4/doc/SMRT%20Analysis%20Software%20Installation%20%28v1.4.0%29.pdf

?

Can anyone please tell me why don't they just have generic executable for some of their software that is part of that pipeline? Why do I have to spend a day installing this? This should be simpler. Sorry for my rant, I just don't get it.

I don't want all their fancy tools, I don't need to login via a web interface to see what's up... oh well...
Leave a comment:

Previous 1 2 template Next

Best Practices for Single-Cell Sequencing Analysis

by seqadmin

While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
- Channel: Articles
Yesterday, 07:15 AM
Latest Developments in Precision Medicine

by seqadmin

Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
- Channel: Articles
05-24-2024, 01:16 PM

Topics	Statistics	Last Post
The Adaptation of the Cell Cycle in Multiciliated Cells by seqadmin Started by seqadmin, Today, 06:58 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:58 AM
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, Yesterday, 08:18 AM	0 responses 19 views 0 likes	Last Post by seqadmin Yesterday, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, Yesterday, 08:04 AM	0 responses 18 views 0 likes	Last Post by seqadmin Yesterday, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News