What do you think of my plan to mask the pseudoautosomal segments of human Y chromosome prior to running bowtie on an RNASeq project?
Since pseudoautosomal portion of human genome chromosomes X & Y are sequence-wise identical, any alignment strategy that utilizes only unique alignments (i.e. using `-m 1` option to bowtie) will discard all alignments to these regions, as each aligning read will have two matches. Thus the 24 known genes wont be counted.
Therefore, I plan to use EMBOSS' `maskseq` to "hard mask" (replace with 'N') chrY prior to building the bowtie indices at:
chrY:10001-2649520
chrY:59034050-59363566
Does anyone see a problem with this approach?
Related note: I see the `--ntoa` option of the bowtie manual that explicitly states that "By default, Ns are simply excluded from the index and bowtie will not report alignments that overlap them." Does anyone know if the same is true for Xs... are they simply exluded from the index?
Finally, do you agree that a new command option for bowtie-build causing it to ignore portions of <reference_in> would be a sensible feature to request (for just such cases)?
Thanks for thinking!
Malcolm Cook
Stowers Institute for Medical Research
P.S. please excuse if you notice I previously posted this as Excuse http://seqanswers.com/forums/showpos...&postcount=293
at tail of longer thread
Since pseudoautosomal portion of human genome chromosomes X & Y are sequence-wise identical, any alignment strategy that utilizes only unique alignments (i.e. using `-m 1` option to bowtie) will discard all alignments to these regions, as each aligning read will have two matches. Thus the 24 known genes wont be counted.
Therefore, I plan to use EMBOSS' `maskseq` to "hard mask" (replace with 'N') chrY prior to building the bowtie indices at:
chrY:10001-2649520
chrY:59034050-59363566
Does anyone see a problem with this approach?
Related note: I see the `--ntoa` option of the bowtie manual that explicitly states that "By default, Ns are simply excluded from the index and bowtie will not report alignments that overlap them." Does anyone know if the same is true for Xs... are they simply exluded from the index?
Finally, do you agree that a new command option for bowtie-build causing it to ignore portions of <reference_in> would be a sensible feature to request (for just such cases)?
Thanks for thinking!
Malcolm Cook
Stowers Institute for Medical Research
P.S. please excuse if you notice I previously posted this as Excuse http://seqanswers.com/forums/showpos...&postcount=293
at tail of longer thread