Apologies if this has been covered elsewhere, couldn't find a satisfactory answer easily....
The problem: I have hi-seq 2500 PE reads from a microbial culture that contain ONE cyanobacterial genome of interest and several contaminating genomes. My understanding is that by blasting against a local reference database containing only cyanobacterial genomes, I could bin my contigs by those which contain any cyanobacterial genes and those which do not.
Further analysis of G-C content and tetranucleotide frequencies could then be used to eliminate chimeric contigs, leaving me with a draft genome.
Could anybody point me in the direction of resources to help me write a BLAST algorithm do perform this task, maybe using BioPython (I have just started learning python)? I don't need long stretches of sequence to align, just the presence of a single gene with a good match in a whole contig would be enough to put it in the 'keep' pile.
I'm new to bioinformatics and essentially teaching myself so any pointers much appreciated...
Cheers
Nathan
The problem: I have hi-seq 2500 PE reads from a microbial culture that contain ONE cyanobacterial genome of interest and several contaminating genomes. My understanding is that by blasting against a local reference database containing only cyanobacterial genomes, I could bin my contigs by those which contain any cyanobacterial genes and those which do not.
Further analysis of G-C content and tetranucleotide frequencies could then be used to eliminate chimeric contigs, leaving me with a draft genome.
Could anybody point me in the direction of resources to help me write a BLAST algorithm do perform this task, maybe using BioPython (I have just started learning python)? I don't need long stretches of sequence to align, just the presence of a single gene with a good match in a whole contig would be enough to put it in the 'keep' pile.
I'm new to bioinformatics and essentially teaching myself so any pointers much appreciated...
Cheers
Nathan
Comment