- Quantification (single library) - ENCODE HepG2 cell line from the caltech RNA-seq (2x75 bp Caltech HepG2 rep2)
validation data: NanoString nCounter data from Steijger et al. 2013. http://www.nature.com/nmeth/journal/...meth.2714.html
- Pairwise DE - ALEXA seq data. In house libraries for two cell lines: MIP101 and MIP5FU
validation data: q-pcr from Griffith et al. (in Alexa-seq paper). Expression values for each library, as well as fold changes provided. http://www.nature.com/nmeth/journal/...meth.1503.html
- Group DE - 6 in house libraries from MAGIC project group3 and group4, (3 libs each each)
validation data: Microarray expression results from first MAGIC paper (Northcott et al 2010). http://jco.ascopubs.org/content/earl....4324.abstract This data is a list of differentially expressed genes between groups 3 and 4, not a quantification on the isoform level. It doesn't provide a real "ground truth" set, but rather just a subset of genes and transcripts that may be biologically interesting to look at.
The first two datasets have wet-lab experimental transcript expression values. The third dataset doesn't have actual validated transcript expression values to compare to, so I just did pairwise comparisons between the tools for a subset of transcripts from interesting genes.
The results are in this google document. I didn't spend any time making it look nice, so you just need to zoom in to see the plots.
For single-library quantification, SailFish wins since it's the fastest and is qualitatively equal to the next best option, eXpress. For pairwise DE it's less clear. For group DE we can't really draw any conclusions because there's no ground truth.
** note: cufflinks was sometimes run twice, with different alignments: "cuff gsc" means cufflinks was run on in house spliced alignents, "cuff tophat" means cufflinks was run on tophat alignments.
Leave a comment: