I am looking for a real (High performace computing / HPC) fast fasta or fastq parsing program. I just want the most simple statistics imaginable:
- number of reads
- total nr of bases.
Other stuff like average length/ATCG composition is nice, but not required.
I searched the software page, tried some packages, wrote my own parsers but they are all slow.
I am looking for something in C code, which can be super fast I hope.
I also tried this simple bash code:
" time grep -v '^>' ./test.fa | wc -m -l"
which is 'fast' ( 30 seconds to scan 1 GB fasta (file in memory)
My simple python script takes over a minute to scan this file. But I hope this can be done faster, or all in one script.
If you want to scan gigabytes of files, it would be nice to have a very fast parser.
Anyone who is aware of such program? Or, what do you think is the fastest program you know?
- number of reads
- total nr of bases.
Other stuff like average length/ATCG composition is nice, but not required.
I searched the software page, tried some packages, wrote my own parsers but they are all slow.
I am looking for something in C code, which can be super fast I hope.
I also tried this simple bash code:
" time grep -v '^>' ./test.fa | wc -m -l"
which is 'fast' ( 30 seconds to scan 1 GB fasta (file in memory)
My simple python script takes over a minute to scan this file. But I hope this can be done faster, or all in one script.
If you want to scan gigabytes of files, it would be nice to have a very fast parser.
Anyone who is aware of such program? Or, what do you think is the fastest program you know?
Comment