Hi,
I am busy with an RNA analysis, and following the steps outlined in Next Generation Sequencing Wiki on http://en.wikibooks.org/wiki/Next_Ge...cing_(NGS)/RNA, I have stumbled upon a problem I cannot solve: the counOverlaps function seems to count hits for a gene even if the gene wholy falls into an area between two reads. As an example, consider the IGV.pdf file attached below. It shows the area occupied by the PPP1R26P2 pseudogene, which is located on chr22:20,499,890-20503484. PPP1R26P2 is fully contained between two reads in case of 43 pairs, and countOverlaps(tx_by_gene, reads) yields the result 43 for this gene.
The result should be 0, because "reads" does not contain any information about read pairs, only information about each read's location separately, and locations of PPP1R26P2 and neighbouring reads are disjoint.
Do you have any idea how to make this function work properly or what to use instead?
Best regards,
Marcin.
I am busy with an RNA analysis, and following the steps outlined in Next Generation Sequencing Wiki on http://en.wikibooks.org/wiki/Next_Ge...cing_(NGS)/RNA, I have stumbled upon a problem I cannot solve: the counOverlaps function seems to count hits for a gene even if the gene wholy falls into an area between two reads. As an example, consider the IGV.pdf file attached below. It shows the area occupied by the PPP1R26P2 pseudogene, which is located on chr22:20,499,890-20503484. PPP1R26P2 is fully contained between two reads in case of 43 pairs, and countOverlaps(tx_by_gene, reads) yields the result 43 for this gene.
The result should be 0, because "reads" does not contain any information about read pairs, only information about each read's location separately, and locations of PPP1R26P2 and neighbouring reads are disjoint.
Do you have any idea how to make this function work properly or what to use instead?
Best regards,
Marcin.
Comment