Hello
I am dealing with a rather common problem which gives me real headaches with PacBio data:
How can you filter your reads prior assembly ?
Note: I am not talking about subread filtering, quality filtering and so on, but filtering for "contamination".
E.g. Assembling a genome and filtering known metagenomes, or filtering only mitochondrial reads and assembling its genome from a sequenced organism ,.....
So, what I am looking for is a possibility to filter the *.h5 files and only use certain ones for the assembly. I guess one would have to parse the files, but with the h5 format that is a little bit beyond my scope.
For some modules (e.g. Allora) FASTA can be provided, but for other ones not (e.g. HGAP).
Anybody ideas ?
I am dealing with a rather common problem which gives me real headaches with PacBio data:
How can you filter your reads prior assembly ?
Note: I am not talking about subread filtering, quality filtering and so on, but filtering for "contamination".
E.g. Assembling a genome and filtering known metagenomes, or filtering only mitochondrial reads and assembling its genome from a sequenced organism ,.....
So, what I am looking for is a possibility to filter the *.h5 files and only use certain ones for the assembly. I guess one would have to parse the files, but with the h5 format that is a little bit beyond my scope.
For some modules (e.g. Allora) FASTA can be provided, but for other ones not (e.g. HGAP).
Anybody ideas ?
Comment