So, working on a de novo assembly using Canu, and it seems to be VERY sensitive to the genomeSize=XXX parameter which is required. As it is a new project, no one has an actual "size" on it (checked T Ryan Gregory's site...nothing similar there either).
So, I am using BBMap suite, specifically...the "kmercountexact.sh" component. Waiting on a compute node right now with >64GB of ram to run, but have it set as follows: kmercountexact.sh in=filtered_subreads.fastq khist=khist.txt peaks=peaks.txt out=genomesize.txt
As Brian Bushnell is active on here, I was hoping to inquire about using this on PacBio specifically...anything I need to be more specific about on the options? Also, can I specify both of my PacBio files as arguments? I have both a .fastq of the long reads as well as a .fasta of much shorter reads supplied by the sequencer people. I know it can do PE files as in= and in2=, but what about to essentially "single" reads?
So, I am using BBMap suite, specifically...the "kmercountexact.sh" component. Waiting on a compute node right now with >64GB of ram to run, but have it set as follows: kmercountexact.sh in=filtered_subreads.fastq khist=khist.txt peaks=peaks.txt out=genomesize.txt
As Brian Bushnell is active on here, I was hoping to inquire about using this on PacBio specifically...anything I need to be more specific about on the options? Also, can I specify both of my PacBio files as arguments? I have both a .fastq of the long reads as well as a .fasta of much shorter reads supplied by the sequencer people. I know it can do PE files as in= and in2=, but what about to essentially "single" reads?
I think I've avoided miniasm thus far because it appears to only output .gfa files? Kind of limits further evaluation of the assembly as most common tools seem to still only take .fasta.
Comment