Seqanswers Leaderboard Ad

**samanta** · 07-29-2013, 10:32 PM

Hello,

Seems like an interesting problem. Here is what you need to do.

(i) Please draw a k-mer distribution of the Illumina reads. I think your Illumina coverage (30) is slightly on the lower side, but we do not know until we see the chart. You can draw k-mer distribution by using SOAPdenovo, DSK (http://minia.genouest.org/dsk/) and many other k-mer counting packages.

Efficient Methods for Counting K-mers

http://www.homolog.us/blogs/blog/2012/03/28/efficient-methods-for-counting-k-mers/

For new readers, easiest way to follow us is through our twitter feed. The feed is updated, whenever we post a commentary here.

(ii) Use any de Bruijn graph-based assembler to assemble the Illumina reads first up to contig level. My favorites are SOAPdenovo (because it can handle PE) and Minia (http://minia.genouest.org/) for being light-weight. Ideally you need to do the assembly at multiple k-mer values.

(iii) Once you have the the Illumina reads assembled, use BLASR (a tool distributed by PacBIO) to map the Illumina contigs on to large PacBio reads.

Only after we have results of this step, we can talk about error correction of PacBio.

Also check the following commentary and discussions in the comment sections.

Pacbio: Why We Stopped Using PacBioToCA and Lived Happily Thereafter

http://www.homolog.us/blogs/blog/2013/07/24/pacbio-why-we-stopped-using-pacbiotoca-and-lived-happily-thereafter/

When we started working on PacBio data one year back, everyone recommended PacBioToCA. Pause for a moment to imagine how summer of 2012 was. Everyone was talking about Illumina, 454, de Bruijn graph, Velvet assembler and so on, and these ‘weird’ reads show up from nowhere. Using an analogy, everyone is talking about pizza and BioMickWatson shows five other foods that are like genome assembly, namely Eton mess, spaghetti Bolognese, Marmite, ‘macaroni’ cheese and anchovite. The initial impulse is to turn all those into toppings for pizza to make them attractive.

If all those are too complicated, please email me at samanta at homolog.us, and we can discuss further.

**horvathdp** · 07-30-2013, 08:35 AM

Thanks for the reply! So, the readme file is really sparce, and I could not find a link to a manual (even in the associated paper published in BMC). Any chance you have a link to the manual? Also, let me run my process by you just to see if I am on the right track:

open an instance in iPlant atmosphere (ubuntu or linuxbiocloud-32bit?).
run the script:

wget -L "http://minia.genouest.org/dsk/dsk-1.5280.tar.gz"
tar -xzf dsk-1.5280.tar.gz
cd ./dsk
make

From here I am pretty lost. Without a manual, I am not even sure what format my input files need to be in, nor do I have a list of the arguments or the order that they are supposed to be presented. If you had a model script, I’d be most appreciative. From the paper, it seems clear that I need to convert my Fastq files into fasta-no problem there. However, it is unclear if the paired end reads should/could be interlaced, or if I should/could combine my four libraries (2 are have inserts of about 270 bases and two have inserts of about 390 bases). Any thoughts or suggestions?

**samanta** · 07-30-2013, 07:26 PM

Originally posted by horvathdp View Post

From here I am pretty lost. Without a manual, I am not even sure what format my input files need to be in, nor do I have a list of the arguments or the order that they are supposed to be presented. If you had a model script, I’d be most appreciative. From the paper, it seems clear that I need to convert my Fastq files into fasta-no problem there. However, it is unclear if the paired end reads should/could be interlaced, or if I should/could combine my four libraries (2 are have inserts of about 270 bases and two have inserts of about 390 bases). Any thoughts or suggestions?

Oh well.

Please send me an email and I will try to walk you through the steps. Maybe we need to start in a different way.

**rchikhi** · 08-25-2013, 04:13 PM

Originally posted by horvathdp View Post

wget -L "http://minia.genouest.org/dsk/dsk-1.5280.tar.gz"
tar -xzf dsk-1.5280.tar.gz
cd ./dsk
make

From here I am pretty lost. Without a manual, I am not even sure what format my input files need to be in, nor do I have a list of the arguments or the order that they are supposed to be presented. If you had a model script, I’d be most appreciative. From the paper, it seems clear that I need to convert my Fastq files into fasta-no problem there. However, it is unclear if the paired end reads should/could be interlaced, or if I should/could combine my four libraries (2 are have inserts of about 270 bases and two have inserts of about 390 bases). Any thoughts or suggestions?

Hello,

I regret that you had issues running DSK. You were on the right track though.
If anyone reads this and wonders what the answers to his questions are:

The input data needs not be FASTA. The README files provides some guidance:

* File input can be fasta, fastq, gzipped or not.
* To pass several files as input : create a file with the list of file names (one per line), and pass this file to dsk
Format of paired-end reads (interlaced or not), and whether to combine libraries of different inserts or not: how the reads are paired does not matter, DSK sees the reads as a multiset of k-mers.

However, there are easier ways to correct PacBio reads using Illumina than re-inventing the wheel. There are at least two existing tools, PacBioToCA and LSC:

https://github.com/PacificBioscience...na-short-reads

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Novice looking to use PacBio data

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News