Originally posted by bbl
View Post
Here my answer to Bbl:
First of all, HTSeq is not a "program" but a framework that allows you to conveniently write your own scripts. Hence, I have to assume that you are a bioinformatician with at least a bit of experience in programming. If you are not, this might be difficult.
If you already know some Python, read the tutorial (Tour) in the HTSeq documentation. If you don't know Python yet but do know Perl or something similar, maybe look first through the first few pages of the Python Tutorial on the Python web site. You don't need to know much for HTSeq.
Given a SAM file (not BAM), you can calculate the coverage in a HTSeq.GenomicArray, as shown in the section "Calculating coverage vectors" of the Tour.
These here are the lines in the Tour that you need to use and adapt:
Code:
import HTSeq yeast_chroms = [ "2-micron", "MT", "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX", "X", "XI", "XII", "XIII", "XIV", "XV", "XVI" ] alignment_file = HTSeq.SAM_Reader( "yeast_RNASeq_excerpt.sam" ) cvg = HTSeq.GenomicArray( yeast_chroms, stranded=True, typecode='i' ) for alngt in alignment_file: if alngt.aligned: cvg.add_value( 1, alngt.iv )
Code:
iv = HTSeq.GenomicInterval( "II", 120100, 120120, "+" ) print list( cvg[iv] )
You can now go through your BED file and do this for each interval given there. I assume that your BED file has four columns, for chromosome, start, end, and name of the exon. (If you have another format, you have to change the line below with the 'split' method.)
Code:
for line in open( "mybedfile.bed" ): # Split the line into the four fields: chrom, start, end, name = line.rstrip().split( "\t" ) # Make two intervals, one for '+', one for '-': ivPlus = HTSeq.GenomicInterval( chrom, start, end, "+" ) ivMinus = HTSeq.GenomicInterval( chrom, start, end, "-" ) # Write out the results: print "Exon", name, ":" print "+:", list( cvg[ivPlus] ) print "-:", list( cvg[ivMinus] ) print
In any case, I guess your work is not done here; you might want to calculate statistics or make plots from the output. This should be easy wth Python (incl numpy/scipy and matplotlib), too, so it's worth learning it.
Simon
Comment