I have a RNAseq data-set with the majority (90%) of the reads being multi-reads (multi-hits / multi-mapped reads). The data is from sperm, so the high level of multi-reads may not be a sign that something is terribly wrong (could be from e.g. piwiRNA)
But I am unsure how best to handle these reads. I do not particularly care about alternative splicing, so I was thinking of using edgeR/DESeq rather than Cufflinks/Cuffdiff.
Simple probabilistic assignments of the reads would not work with edgeR/DESeq. In this post http://seqanswers.com/forums/showthread.php?t=26661 using Cufflinks to assign the reads and estimate raw counts is suggested, but does that work with the models used in edgeR/DESeq?
The other options seems to be
1. randomly assigning the multi-reads, how bowtie2 usually does, if I am correct (but how does that affect the models, when so much is randomly assigned?)
2. saving the n best hits and using that mapping, but I believe edgeR/DESeq assumes each count is a unique read, so that might not be good either.
How would you handle such data?
But I am unsure how best to handle these reads. I do not particularly care about alternative splicing, so I was thinking of using edgeR/DESeq rather than Cufflinks/Cuffdiff.
Simple probabilistic assignments of the reads would not work with edgeR/DESeq. In this post http://seqanswers.com/forums/showthread.php?t=26661 using Cufflinks to assign the reads and estimate raw counts is suggested, but does that work with the models used in edgeR/DESeq?
The other options seems to be
1. randomly assigning the multi-reads, how bowtie2 usually does, if I am correct (but how does that affect the models, when so much is randomly assigned?)
2. saving the n best hits and using that mapping, but I believe edgeR/DESeq assumes each count is a unique read, so that might not be good either.
How would you handle such data?
Comment