Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gene_x
    Senior Member
    • May 2010
    • 108

    featureCounts option question

    New to use featureCounts on RNA-seq analysis, my data is polyA enriched, stranded, single end Illumina reads.

    My goal is to do differential expression analysis between control and case groups. I plan to use DEseq2 to do the DE analysis after featureCounts.

    I have a few questions:

    1. I'm wondering if it's best to use -M −−fraction options or −−primary option or neither? I understand in ChIP-seq, people often only keep uniquely mapped reads, not sure about RNA-seq and also whether to only keep primary alignments. My feeling is that it's best to use --primary option.

    -M
    If specified, multi-mapping reads/fragments will be counted. A multi-mapping read will be counted up to N times if it has N reported mapping locations. The program uses the ‘NH’ tag to find multi-mapping reads.

    −−fraction
    If specified, a fractional count 1/n will be generated for each multi-mapping read, where n is the number of alignments (in- dicated by ‘NH’ tag) reported for the read. This option must be used together with the ‘-M’ option.
    −−primary
    If specified, only primary alignments will be counted. Primary and secondary alignments are identified using bit 0x100 in the Flag field of SAM/BAM files. All primary alignments in a dataset will be counted no matter they are from multi- mapping reads or not (ie. ‘-M’ is ignored).
    2. I read from many sources saying that it's normal to observe high level of duplicated reads for RNA-seq. So is it best not to use −−ignoreDup option?

    3. My current command line looks like this:

    Code:
    featureCounts -t exon -g gene_id -a genes.gtf -F GTF -o outfile.txt -s 1 −−primary input.bam
    Please let me know if there is some other options that I better use.

    Thanks!
    Last edited by gene_x; 08-09-2016, 09:44 AM.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    How did you handle the multimappers in your alignment program? Did you use one of these options (for example this is what BBMap allows)

    Code:
    best    (use the first best site)
    toss    (consider unmapped)
    random  (select one top-scoring site randomly)
    all     (retain all top-scoring sites)

    Comment

    • gene_x
      Senior Member
      • May 2010
      • 108

      #3
      Good point.

      I used hisat2 to do alignment and I think the default setting is -k option at

      -k <int>
      It searches for at most <int> distinct, primary alignments for each read. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments.

      Default: 5 (HFM)
      Then I guess I don't really need --primary option here because all the reported alignments are primary.

      But still not sure if I should keep these multi-mapping reads at all. I read in a best practice paper saying tools including featureCounts often discard these multi-mapping reads whereas these newer ones (Sailfish/Salmon, Kallisto, RSEM) keep them.


      Originally posted by GenoMax View Post
      How did you handle the multimappers in your alignment program? Did you use one of these options (for example this is what BBMap allows)

      Code:
      best    (use the first best site)
      toss    (consider unmapped)
      random  (select one top-scoring site randomly)
      all     (retain all top-scoring sites)

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Having k set to 5 means you only count that many positions (even if there are more). Using "random" option with BBMap does not throw information away but does not overcount at the same time.

        If "mapping" (not precise) the reads is ok instead of alignment then the newer tools you mention are fast option.

        Comment

        • dpryan
          Devon Ryan
          • Jul 2011
          • 3478

          #5
          One clarification, in (classical) RNAseq multimappers are excluded (I'm counting Salmon/Kallisto/et al. as non-classical). In ChIPseq, primary alignments from multimappers are typically included.

          Comment

          • gene_x
            Senior Member
            • May 2010
            • 108

            #6
            really? Could you provide a reference for the treatment of multimappers in ChIP-seq? To the contrary, I believe they are discarded and only uniquely mapped reads are kept.

            Originally posted by dpryan View Post
            One clarification, in (classical) RNAseq multimappers are excluded (I'm counting Salmon/Kallisto/et al. as non-classical). In ChIPseq, primary alignments from multimappers are typically included.

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              I'll see if I can find a reference when I'm in the office tomorrow. Using only "unique alignments" prevents finding peaks in genes with upstream repeats (there are a number of them) and expressed repeats (we have a large group working on them).

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                Here are nine questions we think about, in roughly the order they matter, before...
                Yesterday, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              20 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              38 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              44 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              49 views
              0 reactions
              Last Post SEQadmin2  
              Working...