Hello experts,
I want to classify several millions of pathogen DNA/cDNA reads to taxonomical groups (like bacterial, viral, fungal ...).
My pipeline is as below...
Sequencer: HiSeq 2000 pair-end with NexteraXT library prep kit
Removing human reads: BOWTIE2
Mapping to nt/nr database: BLAST+ (-max_target_seq 1)
Then, I tried to classify BLAST output (tabular) to taxonomic groups by MEGAN5 with auxiliary mapping file they provided (http://www-ab.informatik.uni-tuebing...March2015X.zip).
This beautiful software successfully classified 90% of reads to specific taxons, but about 10% reads was "Not assigned".
However, I found some correctly BLASTed reads in this "Not assiened" group.
These reads are correctly assigned to specific pathogenic GIs and Taxon IDs, but "Not assigned" by MEGAN.
I guess this is because ..
(1) NCBI does not provide taxon ids to nt/nr database, so MEGAN miss certain amount of correct hits.
(2) MEGAN does not support latest gi-taxon database, and miss recently added GIs.
Thanks,
I want to classify several millions of pathogen DNA/cDNA reads to taxonomical groups (like bacterial, viral, fungal ...).
My pipeline is as below...
Sequencer: HiSeq 2000 pair-end with NexteraXT library prep kit
Removing human reads: BOWTIE2
Mapping to nt/nr database: BLAST+ (-max_target_seq 1)
Then, I tried to classify BLAST output (tabular) to taxonomic groups by MEGAN5 with auxiliary mapping file they provided (http://www-ab.informatik.uni-tuebing...March2015X.zip).
This beautiful software successfully classified 90% of reads to specific taxons, but about 10% reads was "Not assigned".
However, I found some correctly BLASTed reads in this "Not assiened" group.
These reads are correctly assigned to specific pathogenic GIs and Taxon IDs, but "Not assigned" by MEGAN.
I guess this is because ..
(1) NCBI does not provide taxon ids to nt/nr database, so MEGAN miss certain amount of correct hits.
(2) MEGAN does not support latest gi-taxon database, and miss recently added GIs.
Thanks,
Comment