Hello bioinformatics pioneers!
Upon digging into the bowtie/tophat documentation (I am using bowtie1 since I have very short (29bp after barcode trimming), single end reads), I realize that the algorithm works (I think) differently from what I would a priori assume. I am using bowtie 1.4.1, not called directly, but called from within tophat. I am using the --transcriptome-index flag to guide mapping to annotated cDNAs first, since I am working with a very well-annotated genome (Arabidopsis thaliana).
My intuition would say, for a well-annotated genome (such as mine), map to the annotated cDNAs first, transform these into chromosome-mapped reads, and then map the leftover reads onto the chromosomes. (This is not perfectly satisfying, I should probably want to map in a more unbiased way, but I am assuming that what I care about is quantifying the expression of a known gene, rather than discovering new spliceforms. Perhaps this is a big assumption.)
But, as far as I can tell, bowtie has another filter: it only accepts end-to-end reads (I know bowtie2 has made such mapping optional, making unnconnected local alignments possible). Most notably, this eliminates reads that happen to fall at low abundance in introns and at the edges of UTRs. I presume this is a noise filtering function. Does anyone know of any quantification that has been done (from your own analysis, or in a paper) on the effects of this filtering?
Cheers,
~Rachel
Upon digging into the bowtie/tophat documentation (I am using bowtie1 since I have very short (29bp after barcode trimming), single end reads), I realize that the algorithm works (I think) differently from what I would a priori assume. I am using bowtie 1.4.1, not called directly, but called from within tophat. I am using the --transcriptome-index flag to guide mapping to annotated cDNAs first, since I am working with a very well-annotated genome (Arabidopsis thaliana).
My intuition would say, for a well-annotated genome (such as mine), map to the annotated cDNAs first, transform these into chromosome-mapped reads, and then map the leftover reads onto the chromosomes. (This is not perfectly satisfying, I should probably want to map in a more unbiased way, but I am assuming that what I care about is quantifying the expression of a known gene, rather than discovering new spliceforms. Perhaps this is a big assumption.)
But, as far as I can tell, bowtie has another filter: it only accepts end-to-end reads (I know bowtie2 has made such mapping optional, making unnconnected local alignments possible). Most notably, this eliminates reads that happen to fall at low abundance in introns and at the edges of UTRs. I presume this is a noise filtering function. Does anyone know of any quantification that has been done (from your own analysis, or in a paper) on the effects of this filtering?
Cheers,
~Rachel
Comment