Unconfigured Ad

**Brian Bushnell** · 07-14-2016, 02:14 PM

Hi Jeff,

Unfortunately, I don't have a good method for this. I've tried kmercountexact, and it does not work on raw PacBio reads due to the high error rate. I do not know of a better method for genome size estimation than assembling, with Falcon, for example. Sorry!

If you have multiple files, though, you can enter them comma-delimited, like this:

Code:

kmercountexact.sh in=filtered1.fq,filtered2.fq

Not all tools support that, but Tadpole, KmerCountExact, and Dedupe do.

**jpummil** · 07-14-2016, 06:48 PM

Thanks for the quick response, Brian!

Good to know about the comma-delimited method for multiple entries. Unfortunate to hear about the PacBio error issue when trying to determine genome size. I thought about this a bit and am wondering if the pre-processing Canu does to the data could be used prior to trying kmercountexact? It outputs a couple of files during its run which trim, then correct the reads:

<filename>.trimmedReads.fasta.gz

then

<filename>.correctedReads.fasta.gz

Of course, they have been processed WITH the genomeSize estimate provided at run time and I'm not certain of how much that might have influenced any trimming or correction. I might try and contact Phillippy or Koren and inquire further ;-)

**wdecoster** · 07-15-2016, 02:46 AM

Perhaps a quick and dirty assembly with miniasm can give you an idea? https://github.com/lh3/miniasm

**jpummil** · 07-15-2016, 06:51 AM

Originally posted by wdecoster View Post

Perhaps a quick and dirty assembly with miniasm can give you an idea? https://github.com/lh3/miniasm

Thanks for the suggestion wdecoster

I think I've avoided miniasm thus far because it appears to only output .gfa files? Kind of limits further evaluation of the assembly as most common tools seem to still only take .fasta.

Update: Found a note in another thread about converting .gfa to .fasta Trying it now...

awk '/^S/{print ">"$2"\n"$3}' in.gfa | fold > out.fa

**jsoghigian** · 10-03-2016, 05:27 AM

Originally posted by jpummil View Post

Thanks for the suggestion wdecoster

I think I've avoided miniasm thus far because it appears to only output .gfa files? Kind of limits further evaluation of the assembly as most common tools seem to still only take .fasta.

Update: Found a note in another thread about converting .gfa to .fasta Trying it now...

awk '/^S/{print ">"$2"\n"$3}' in.gfa | fold > out.fa

About to try this method myself - jpummil, were you successful in estimating genome size from your raw reads?

**jpummil** · 10-06-2016, 09:44 AM

Originally posted by jsoghigian View Post

About to try this method myself - jpummil, were you successful in estimating genome size from your raw reads?

The assembly itself using miniasm and the conversion script from gfa to fasta worked fine, though the assembly isn't as "good" as from Canu.

Still no really good way to estimate a genome size from the PacBio reads. Schatz put together a really nice tool called GenomeScope, but currently only works with Illumina reads.

**Markiyan** · 10-10-2016, 05:00 AM

Try analysing the reads from the short inserts (multipass ones).

You can try extracting the long raw reads from the short library inserts, which pass the insert multiple times (CCS-like reads), doing self error correction, and than using kmer counter software designed for Illumina, 454 or Sanger data.

Also please be aware, that you may have to screen out the high copy number DNA (mitochondrial/plastid genomes) before doing kmer counting.

Also you may get some PCR-Free miseq data to complement your pacbio assembly. (Can be cheaper if your coverage is still too low).

**kartika** · 10-06-2019, 08:15 AM

thanks you

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

Genome Size Estimation from PacBio Raw Reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News