Hi All,
I'm working with a library with significant bacterial contamination, and I've spent a lot of time trying to remove it without much success. The organism of interest is an obligate root pathogen (not bacterial), and I'm afraid I've sequenced many non-target, associated bacterial species. Eventually, I hope to do some de novo assembly of the cleaned reads. Well, I've already done some de novo assembly, but find many bacterial sequences in my blast results. I still have some things I want to try, but wanted to ask for suggestions so that I might optimize my strategy and time. Anyway, so far I have tried:
- Mapping raw reads DeconSeq with the included bacterial databases. This hasn't worked particularly well, and the program frequently crashes on our system anyway (even after recompiling as suggested).
- Mapping raw reads with bwa mem to NCBI's all bacterial genome database, that is I downloaded the all_fna.tar.gz file for bacterial genomes, concatenated them, split this file into reasonably sized files, and indexed them as references for bwa mem. I then wrote a script to pull out any unmapped sequences from the resulting sam files. I realize this a nearly identical approach to DeconSeq, but it seems to work a little better (and is much more stable!)
Via blast I'm still finding bacterial contamination in my resulting contigs, so whatever I'm doing isn't working well enough. I've checked the forums and it seems like BBMap/split is a logical next step, so I'll be trying that soon. I've got some questions for you:
- With BBsplit can I use my concatenated NCBI bacterial genome fasta as my reference?
- I've been using the default algorithm parameters for bwa mem. Is there something that I might change to make that pipeline more effective?
- Any other suggestions?
Obviously I've learned my lesson, and I'm trying to acquire some much cleaner template right now. However, I'd like to not waste all the data I've already received.
Thanks for your help; this website is such a great resource!
I'm working with a library with significant bacterial contamination, and I've spent a lot of time trying to remove it without much success. The organism of interest is an obligate root pathogen (not bacterial), and I'm afraid I've sequenced many non-target, associated bacterial species. Eventually, I hope to do some de novo assembly of the cleaned reads. Well, I've already done some de novo assembly, but find many bacterial sequences in my blast results. I still have some things I want to try, but wanted to ask for suggestions so that I might optimize my strategy and time. Anyway, so far I have tried:
- Mapping raw reads DeconSeq with the included bacterial databases. This hasn't worked particularly well, and the program frequently crashes on our system anyway (even after recompiling as suggested).
- Mapping raw reads with bwa mem to NCBI's all bacterial genome database, that is I downloaded the all_fna.tar.gz file for bacterial genomes, concatenated them, split this file into reasonably sized files, and indexed them as references for bwa mem. I then wrote a script to pull out any unmapped sequences from the resulting sam files. I realize this a nearly identical approach to DeconSeq, but it seems to work a little better (and is much more stable!)
Via blast I'm still finding bacterial contamination in my resulting contigs, so whatever I'm doing isn't working well enough. I've checked the forums and it seems like BBMap/split is a logical next step, so I'll be trying that soon. I've got some questions for you:
- With BBsplit can I use my concatenated NCBI bacterial genome fasta as my reference?
- I've been using the default algorithm parameters for bwa mem. Is there something that I might change to make that pipeline more effective?
- Any other suggestions?
Obviously I've learned my lesson, and I'm trying to acquire some much cleaner template right now. However, I'd like to not waste all the data I've already received.
Thanks for your help; this website is such a great resource!
Comment