Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Joshua.Urrutia
    Junior Member
    • Nov 2015
    • 7

    Problem generating counts with featureCount

    Hello, I am learning how to do RNA-seq analysis following this tutorial:

    Last week I ran a one-day workshop on RNA-seq data analysis in the UVA Health Sciences Library. I set up an AWS public EC2 image with all the necessary software installed. Participants logged into AWS, launched the image, and we kicked off the morning ...


    I'm having a problem generating a table of gene counts using the program featureCounts on my annotation file (GTF) and aligned sequences (BAM).

    Here's what I'm typing into the unix shell:

    Code:
    $ featureCounts -a Homo_sapiens.GRCh38.82.gtf -o counts.txt -t exon -g gene_name -T 4 */accepted_hits.bam
    I don't get any errors and the program seems to handle everything ok, here is an example output for one of the files:

    Code:
     Process BAM file trimmed_uvb3.fastq_tophat/accepted_hits.bam...           
    ||    Single-end reads are included.                                          
    ||    Assign reads to features...                                             
    ||    Total reads : 129858                                                    
    ||    Successfully assigned reads : 98419 (75.8%)                             
    ||    Running time : 0.00 minutes
    BUT when I open my counts.txt file, I don't have any hits. All I get back is a table full of zeros for every gene.

    Does anyone know why this is happening?
    Last edited by Joshua.Urrutia; 11-23-2015, 11:29 AM.
  • Joshua.Urrutia
    Junior Member
    • Nov 2015
    • 7

    #2
    Also, here is what the summary file looks like:

    Code:
    $ less counts.txt.summary 
    
    Status 
    trimmed_ctl1.fastq_tophat/accepted_hits.bam    
    trimmed_ctl2.fastq_tophat/accepted_hits.bam    
    trimmed_ctl3.fastq_tophat/accepted_hits.bam     
    trimmed_uvb1.fastq_tophat/accepted_hits.bam     
    trimmed_uvb2.fastq_tophat/accepted_hits.bam     
    trimmed_uvb3.fastq_tophat/accepted_hits.bam
    
    Assigned        44995   50830   46270   100230  85736   98419
    Unassigned_Ambiguity    2702    2694    2944    5802    4149    4907
    Unassigned_MultiMapping 5171    4789    6165    10614   8150    8599
    Unassigned_NoFeatures   7766    8067    8877    16690   16698   17933
    Unassigned_Unmapped     0       0       0       0       0       0
    Unassigned_MappingQuality       0       0       0       0       0       0
    Unassigned_FragmentLength       0       0       0       0       0       0
    Unassigned_Chimera      0       0       0       0       0       0
    Unassigned_Secondary    0       0       0       0       0       0
    Unassigned_Nonjunction  0       0       0       0       0       0
    Unassigned_Duplicate    0       0       0       0       0       0

    Comment

    • Cedric
      Junior Member
      • Jul 2011
      • 1

      #3
      Hi Joshua,

      Replace "-g gene_name" by "-g gene_id" in your command.

      Comment

      • Joshua.Urrutia
        Junior Member
        • Nov 2015
        • 7

        #4
        Thank you for your suggestion!

        Unfortunately, it did not work. It did change the gene names to gene ids in the output table, but there are still zero values for every gene id.

        Comment

        • shi
          Wei Shi
          • Feb 2010
          • 236

          #5
          This sounds strange. Apparently featureCounts have successfully produced counts for genes. Could you please show the first few rows of your counting result?

          Comment

          • Joshua.Urrutia
            Junior Member
            • Nov 2015
            • 7

            #6
            Certainly, here's the head of the counts.txt file:

            Code:
            # Program:featureCounts v1.5.0; Command:"featureCounts" "-a" "Homo_sapiens.GRCh38.82.gtf" "-o" "counts3.txt" "-t" "exon" "-g" "gene_name" "-T" "4" "trimmed_ctl1.fastq_tophat/accepted_hits.bam" "trimmed_ctl2.fastq_tophat/accepted_hits.bam" "trimmed_ctl3.fastq_tophat/accepted_hits.bam" "trimmed_uvb1.fastq_tophat/accepted_hits.bam" "trimmed_uvb2.fastq_tophat/accepted_hits.bam" "trimmed_uvb3.fastq_tophat/accepted_hits.bam" 
            Geneid	Chr	Start	End	Strand	Length	trimmed_ctl1.fastq_tophat/accepted_hits.bam	trimmed_ctl2.fastq_tophat/accepted_hits.bam	trimmed_ctl3.fastq_tophat/accepted_hits.bam	trimmed_uvb1.fastq_tophat/accepted_hits.bam	trimmed_uvb2.fastq_tophat/accepted_hits.bam	trimmed_uvb3.fastq_tophat/accepted_hits.bam
            DDX11L1	1;1;1;1	11869;12613;12975;13221	12227;12721;13052;14409	+;+;+;+	1735	0
            WASH7P	1;1;1;1;1;1;1;1;1;1;1;12;12;12;12;12;12;12;12;12;12;12	14404;15005;15796;16607;16858;17233;17606;17915;18268;24738;29534;14522;15085;15913;16722;16969;17348;17723;18037;18373;26801;31878	14501;15038;15947;16765;17055;17368;17742;18061;18366;24891;29570;14944;15153;16065;16880;17170;17483;17859;18183;18471;26954;32015	-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-	3168	0	0
            MIR6859-1	1	17369	17436	-	68	0	0	0	0
            RP11-34P13.3	1;1;1	29554;30267;30976	30039;30667;31109	+;+;+	1021	0	0	0	0	0	0
            MIR1302-2	1	30366	30503	+	138	0	0	0	0
            FAM138A	1;1;1	34554;35245;35721	35174;35481;36081	-;-;-	1219	0
            OR4G4P	1	52473	53312	+	840	0	0	0	0	0
            OR4G11P	1	62948	63887	+	940	0	0	0
            Its more clear after I load the table into R, the right most columns have only zeros:
            Last edited by Joshua.Urrutia; 11-23-2015, 07:15 PM. Reason: clairity

            Comment

            • shi
              Wei Shi
              • Feb 2010
              • 236

              #7
              Thanks Joshua.Urrutia. But we still couldn't figure out what may cause the problem.

              Could you send me your counts.txt file and also one of the bam files? You may send them offline.

              Comment

              • Joshua.Urrutia
                Junior Member
                • Nov 2015
                • 7

                #8
                Most definitely, thank you for the help. I emailed you a tar file with the .bam file and counts.txt file inside.

                Comment

                • shi
                  Wei Shi
                  • Feb 2010
                  • 236

                  #9
                  Thanks for sending the files. I took a look at your counts.txt file and found that not all the genes got zero count. Below are the number of genes that had at least 1 count in each library:

                  trimmed_ctl1.fastq_tophat.accepted_hits.bam
                  394
                  trimmed_ctl2.fastq_tophat.accepted_hits.bam
                  390
                  trimmed_ctl3.fastq_tophat.accepted_hits.bam
                  402
                  trimmed_uvb1.fastq_tophat.accepted_hits.bam
                  451
                  trimmed_uvb2.fastq_tophat.accepted_hits.bam
                  400
                  trimmed_uvb3.fastq_tophat.accepted_hits.bam
                  489

                  The total number of counts in each library is the same as that reported in featureCounts summary file.

                  Comment

                  • Joshua.Urrutia
                    Junior Member
                    • Nov 2015
                    • 7

                    #10
                    Thanks for your help and sorry for my confusion.

                    The problem must be with the way I am using R to find the non-zero values, and not with featureCounts.

                    Comment

                    • Joshua.Urrutia
                      Junior Member
                      • Nov 2015
                      • 7

                      #11
                      I figured it out, I was improperly using the subset function in R.

                      Sorry again for my confusion. I can't believe I spent days trying to figure that out.

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                        Here are nine questions we think about, in roughly the order they matter, before...
                        Yesterday, 07:11 AM
                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      19 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      38 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      44 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      49 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...