I was watching this video of Lior Pachter at CSHL last year and he's talking about what to do with multi-mapped reads and it makes no sense to me. What I think he is saying is that by assigning multi-mapped reads to transcripts based on the abundance of the uniquely mapped reads you can reduce technical variance.
The way I see it is if you think about it as two situations:
1) You have two genes with a lot of uniquely-mapped reads to them. They share a pool of reads that map to both of them. In this case you can accurately assign the reads to them. But it does not matter because since they already have a sufficiently high read count, adding more reads to them will not decrease the technical variance.
2) Situation two is more problematic in my mind. You have two genes with a low number of reads assigned to them (say less then 10). They share a pool of that map to both of them. Since the uniquely mapped reads are low there is no way to accurately figure out what the correct proportion at which to divide the reads. And then after you assign the reads based on the low number of uniquely mapped reads it makes the read count artificially high considering the multi-mappers cannot be accurately assigned, which would then cause a under-representation of variance when looking at differential expression.
This is in regards to differential expression analysis, which I think is the most common use of RNA-seq. I can see an argument for using multi-mappers when looking trying to identify relative abundance within a single sample.
There's some other stuff in the presentation that doesn't really make sense to high-school level of math education. But if someone has any insight here, I'd be interested to hear it.
The way I see it is if you think about it as two situations:
1) You have two genes with a lot of uniquely-mapped reads to them. They share a pool of reads that map to both of them. In this case you can accurately assign the reads to them. But it does not matter because since they already have a sufficiently high read count, adding more reads to them will not decrease the technical variance.
2) Situation two is more problematic in my mind. You have two genes with a low number of reads assigned to them (say less then 10). They share a pool of that map to both of them. Since the uniquely mapped reads are low there is no way to accurately figure out what the correct proportion at which to divide the reads. And then after you assign the reads based on the low number of uniquely mapped reads it makes the read count artificially high considering the multi-mappers cannot be accurately assigned, which would then cause a under-representation of variance when looking at differential expression.
This is in regards to differential expression analysis, which I think is the most common use of RNA-seq. I can see an argument for using multi-mappers when looking trying to identify relative abundance within a single sample.
There's some other stuff in the presentation that doesn't really make sense to high-school level of math education. But if someone has any insight here, I'd be interested to hear it.