Hi,
Good morning.
I have used bowtie to map reads to a bacterial genome. This genome contains a set of around 10 genes belonging to a specific family which show very high sequence similarity with in themselves (>95%). So practically they are the same sequences with a very small number of base pair differences across them.
Many of these genes turn up as differentially expressed using bowtie mapping followed by HtSeq read count and EdgeR steps.
I can see (using IGV) that there is high mapping of reads across the highly similar regions across the gene set (in the experimental condition) when compared the control; which leads to them being marked as differentially expressed.
Can someone explain (or point me to any previous posts) which explains how bowtie distributes the reads to be mapped across the various genes (irrespective of the fact that the sequences are the same)?
Does is just divide them based on how many genes ?
I am not using the "-a" option and hence assume that my reads are uniquely mapped.
Appreciate any response.
Cheers,
Nandan