Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Try analysing the reads from the short inserts (multipass ones).
You can try extracting the long raw reads from the short library inserts, which pass the insert multiple times (CCS-like reads), doing self error correction, and than using kmer counter software designed for Illumina, 454 or Sanger data.
Also please be aware, that you may have to screen out the high copy number DNA (mitochondrial/plastid genomes) before doing kmer counting.
Also you may get some PCR-Free miseq data to complement your pacbio assembly. (Can be cheaper if your coverage is still too low).
Leave a comment:
-
Originally posted by jsoghigian View PostAbout to try this method myself - jpummil, were you successful in estimating genome size from your raw reads?
Still no really good way to estimate a genome size from the PacBio reads. Schatz put together a really nice tool called GenomeScope, but currently only works with Illumina reads.
Leave a comment:
-
Originally posted by jpummil View PostThanks for the suggestion wdecoster I think I've avoided miniasm thus far because it appears to only output .gfa files? Kind of limits further evaluation of the assembly as most common tools seem to still only take .fasta.
Update: Found a note in another thread about converting .gfa to .fasta Trying it now...
awk '/^S/{print ">"$2"\n"$3}' in.gfa | fold > out.fa
Leave a comment:
-
Originally posted by wdecoster View PostPerhaps a quick and dirty assembly with miniasm can give you an idea? https://github.com/lh3/miniasm
Update: Found a note in another thread about converting .gfa to .fasta Trying it now...
awk '/^S/{print ">"$2"\n"$3}' in.gfa | fold > out.fa
Leave a comment:
-
Perhaps a quick and dirty assembly with miniasm can give you an idea? https://github.com/lh3/miniasm
Leave a comment:
-
Thanks for the quick response, Brian!
Good to know about the comma-delimited method for multiple entries. Unfortunate to hear about the PacBio error issue when trying to determine genome size. I thought about this a bit and am wondering if the pre-processing Canu does to the data could be used prior to trying kmercountexact? It outputs a couple of files during its run which trim, then correct the reads:
<filename>.trimmedReads.fasta.gz
then
<filename>.correctedReads.fasta.gz
Of course, they have been processed WITH the genomeSize estimate provided at run time and I'm not certain of how much that might have influenced any trimming or correction. I might try and contact Phillippy or Koren and inquire further ;-)
Leave a comment:
-
Hi Jeff,
Unfortunately, I don't have a good method for this. I've tried kmercountexact, and it does not work on raw PacBio reads due to the high error rate. I do not know of a better method for genome size estimation than assembling, with Falcon, for example. Sorry!
If you have multiple files, though, you can enter them comma-delimited, like this:
Code:kmercountexact.sh in=filtered1.fq,filtered2.fq
Leave a comment:
-
Genome Size Estimation from PacBio Raw Reads
So, working on a de novo assembly using Canu, and it seems to be VERY sensitive to the genomeSize=XXX parameter which is required. As it is a new project, no one has an actual "size" on it (checked T Ryan Gregory's site...nothing similar there either).
So, I am using BBMap suite, specifically...the "kmercountexact.sh" component. Waiting on a compute node right now with >64GB of ram to run, but have it set as follows: kmercountexact.sh in=filtered_subreads.fastq khist=khist.txt peaks=peaks.txt out=genomesize.txt
As Brian Bushnell is active on here, I was hoping to inquire about using this on PacBio specifically...anything I need to be more specific about on the options? Also, can I specify both of my PacBio files as arguments? I have both a .fastq of the long reads as well as a .fasta of much shorter reads supplied by the sequencer people. I know it can do PE files as in= and in2=, but what about to essentially "single" reads?Tags: None
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 11:09 AM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
Today, 11:09 AM
|
||
Started by seqadmin, Today, 06:13 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
Today, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
Leave a comment: