I try to get strand specific coverage for paired-end RNA reads (using the first read in pair as the counting strand) from a sam-file (sorted by readname).
So far i found a solution using R, but for larger files (starting from 5 Gbyte), its just not feasible to do it with R as it takes ages.
By looking at HTSeq which can handle paired-end reads as bundles, i thought i could get strand specific coverage using the HTSeq.pair_SAM_alignments() method but didnt succeed so far...
This gives me the strand specific coverage of the LEFT reads in pair.
I would like to:
- Find FIRST read in pair
- get its strand direction and change its mate's (LAST read in pair) strand direction to that of the FIRST read in pair
-count coverage of FIRST and LAST read in pair according to strand direction
Could anyone help me help with this task?
Maybe someone could show how to find FIRST read in pair with HTSeq?
Thank you,
Michel
So far i found a solution using R, but for larger files (starting from 5 Gbyte), its just not feasible to do it with R as it takes ages.
By looking at HTSeq which can handle paired-end reads as bundles, i thought i could get strand specific coverage using the HTSeq.pair_SAM_alignments() method but didnt succeed so far...
Code:
import HTSeq import pysam import itertools sam_reader = HTSeq.SAM_Reader( "test.sam") cover = HTSeq.GenomicArray( "auto", stranded=True, typecode='i' ) for bundle in HTSeq.pair_SAM_alignments( sam_reader, bundle=True ): if len(bundle) != 1: continue left_almnt, right_almnt = bundle[0] # extract pair if not left_almnt.aligned and right_almnt.aligned: continue cover[ left_almnt.iv ] += 1 #left_almnt equals left read from 5' end of pair NOT first read in pair cover.write_bedgraph_file( "test-plus.wig", "+" ) cover.write_bedgraph_file( "test-minus.wig", "-" )
I would like to:
- Find FIRST read in pair
- get its strand direction and change its mate's (LAST read in pair) strand direction to that of the FIRST read in pair
-count coverage of FIRST and LAST read in pair according to strand direction
Could anyone help me help with this task?
Maybe someone could show how to find FIRST read in pair with HTSeq?
Thank you,
Michel
Comment