Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

Gene-level Quantification

  • Filter
  • Time
  • Show
Clear All
new posts

  • Gene-level Quantification

    Currently I still use htseq-count to perform gene-level quantification (and RSEM for transcript-level by the way), but I found htseq-count being overly cumbersome to use, requiring re-sorting alignments and even the script per se takes extremely long to run.

    Right now I'm considering faster alternatives such as Kallisto or Salmon. So what are your experience
    • using it for gene-level differential expression analyses such as edgeR or deSeq2? I understand these programs generate transcript-level output, but they can easily be aggregated. I also wonder how STAR's gene-count output file ( compare to these.
    • For Salmon users: I'd still have to generate an alignment anyway for sequence analyses. Have you noticed any difference between quasi-mapping and alignment-based modes?

    Thanks for your input in advance!

  • #2
    I can't comment on using Kallisto or Salmon, but STAR's gene counts seem to perfectly match the three count modes of htseq-count, so long as you used the same gtf when indexing the genome.


    • #3
      FYI, featureCounts is a faster drop-in replacement for htseq-count, so you can also just use that instead.

      The counts from STAR or featureCounts are pretty similar to what you'll get from salmon. Below is a scatter plot that one of our post-docs made a while back. I asked him to look into the genes that were highly discordant between the two, but I don't recall what he found (or if he looked).

      Regarding feeding BAM files into Salmon, this seems to work nicely. Rob has usually suggested that this leads to a bit better results, so if you can spare the memory and a bit of time to run STAR first then that's the way to go.

      BTW, for getting counts for edgeR or DESeq2, have a look at the tximport package in Bioconductor.


      • #4
        Originally posted by dpryan View Post
        FYI, featureCounts is a faster drop-in replacement for htseq-count, so you can also just use that instead.
        Well, as I mentioned I'd still need to do STAR alignments for variant calling, so if it gives a sufficiently accurate gene count, probably I'd go for it.

        But as transcript-level counting is concerned... It's a different story, as we all know.


        • #5
          In terms of aligner choice for mapping RNA-seq data, you may take a look at this paper that just came out this week: