I'm interested in identifying bacterial pathogens from shotgun metagenomic data. Crudely, I envisage the pipeline to be something like this:
1) QC reads
2) Filter out human sequences
3) Identify pathogen sequences
Typically it seems people use the following types of tools for the corresponding steps above:
1) Trimmomatic/cut-adapt
2) Bowtie2/SNAP/BWA against human reference genome(s)
3) Bowtie2/SNAP/BWA against reference bacterial genomes database
My question is can steps 2) and/or 3) be replaced with a sequence composition based approach? (E.g. a k-mer method, such as Kraken, LMAT or CLARK)
Of course k-mer methods will be faster but less sensitive than read alignment with a tool like Bowtie2. Additionally, read alignment uses paired-read and quality score info that k-mer approaches don't use. However, there seems to be an increasing trend to adopt 'alignment-free' or 'psudoalignment' approaches and I'm not sure how to evaluate the trade-offs. I'd be grateful for any advice. Thanks.
1) QC reads
2) Filter out human sequences
3) Identify pathogen sequences
Typically it seems people use the following types of tools for the corresponding steps above:
1) Trimmomatic/cut-adapt
2) Bowtie2/SNAP/BWA against human reference genome(s)
3) Bowtie2/SNAP/BWA against reference bacterial genomes database
My question is can steps 2) and/or 3) be replaced with a sequence composition based approach? (E.g. a k-mer method, such as Kraken, LMAT or CLARK)
Of course k-mer methods will be faster but less sensitive than read alignment with a tool like Bowtie2. Additionally, read alignment uses paired-read and quality score info that k-mer approaches don't use. However, there seems to be an increasing trend to adopt 'alignment-free' or 'psudoalignment' approaches and I'm not sure how to evaluate the trade-offs. I'd be grateful for any advice. Thanks.
Comment