Originally posted by lmf_bill
View Post
Here I pasted a few splice junctions identified with gene annotation and without gene annotation respectively.
Code:
Chr# letf_site right_site #juncReadWithoutGeneAnnotation #juncReadWithGeneAnnotation GeneName chrY 21147161 21150881 4 0 EIF1AY chrY 21147409 21150881 1 1 intragenic chrY 21150965 21153863 4 4 EIF1AY chrY 21153967 21155747 14 14 EIF1AY chrY 21155798 21159297 9 8 EIF1AY chrY 21159379 21160757 17 17 EIF1AY chrY 21160849 21163614 8 8 EIF1AY chrY 2769668 2770205 0 28 RPS4Y1 chrY 2770283 2772117 0 21 RPS4Y1 chrY 2772298 2773686 36 17 RPS4Y1 chrY 2773784 2782640 58 0 RPS4Y1 chrY 2782812 2793128 65 65 RPS4Y1 chrY 2793286 2794833 43 43 RPS4Y1 chrY 7284271 7295396 0 1 PRKY chrY 8577625 8578201 4 4 intergenic
For the mismatches setting, I agree strongly with you. There is no best setting. But intuitively, the number of splice junction reads are less than that of exon reads, and there would be a higher risk to claim a read is splice junction read than a exon read, especially in the cases where no other evidence for the corresponding junction.
Leave a comment: