Hi Keith
At the moment, HTSeq can natively only work with SAM files. Adding BAM support is on my to-do list, and of course, I would do it by simply wrapping the samtools.
For now, however, you can simply call the 'samtools' binary from Python and pipe its output to HTSeq's SAM_Reader class. The following function does that for you:
With this, you can now do, e.g.:
Maybe not the optimum in performance but should do the job.
Cheers
Simon
Originally posted by krobison
View Post
For now, however, you can simply call the 'samtools' binary from Python and pipe its output to HTSeq's SAM_Reader class. The following function does that for you:
Code:
import subprocess import HTSeq def BAM_Region_Reader( bamfile, region=None, samtools_exec="samtools" ): """Get Reader object for part of a BAM file bamfile -- a string, either a file name or an URL to a BAM file region -- a GenomicInterval samtools_exec -- file name (with path, if necessary) of the 'samtools' executable Returns a SAM_Reader object. """ cmd_line = ( samtools_exec , "view", bamfile ) if region is not None: cmd_line += ( "%s:%d-%d" % ( region.chrom, region.start, region.end ), ) samtools_view = subprocess.Popen( cmd_line, stdout = subprocess.PIPE ) return HTSeq.SAM_Reader( samtools_view.stdout )
Code:
bamfile = "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG00155/alignment/HG00155.chrom12.ILLUMINA.bwa.GBR.low_coverage.20100517.bam" region = HTSeq.GenomicInterval( "12", 90000, 91000, "." ) for alignment in BAM_Region_Reader( bamfile, region ): print alignment
Cheers
Simon
Comment