No announcement yet.

Blasting contigs against reference database

  • Filter
  • Time
  • Show
Clear All
new posts

  • Blasting contigs against reference database

    Apologies if this has been covered elsewhere, couldn't find a satisfactory answer easily....

    The problem: I have hi-seq 2500 PE reads from a microbial culture that contain ONE cyanobacterial genome of interest and several contaminating genomes. My understanding is that by blasting against a local reference database containing only cyanobacterial genomes, I could bin my contigs by those which contain any cyanobacterial genes and those which do not.

    Further analysis of G-C content and tetranucleotide frequencies could then be used to eliminate chimeric contigs, leaving me with a draft genome.

    Could anybody point me in the direction of resources to help me write a BLAST algorithm do perform this task, maybe using BioPython (I have just started learning python)? I don't need long stretches of sequence to align, just the presence of a single gene with a good match in a whole contig would be enough to put it in the 'keep' pile.

    I'm new to bioinformatics and essentially teaching myself so any pointers much appreciated...


  • #2
    Use a program like bowtie2 or bbduk to bin reads.


    • #3
      Bowtie2 looks useful, certainly. However, wouldn't this only keep reads that mapped directly to the reference genome, losing some good reads from my genome of interest? I was going along the lines of assembling contigs first and searching within them for matches to the reference.


      • #4
        This is a classic case for using BBSplit ( Use the cyanobacterial genome(s) as the reference and the reads will be binned automatically. If you need help with the actual command line let us know.


        • #5
          BBsplit looks great, thanks! Can't believe I hadn't seen it before, will try it out.