Seqanswers Leaderboard Ad

**Yuqia** · 02-27-2018, 11:07 AM

Missing output for multiple input bam files in featureCounts

Dear Wei Shi,

I use featureCounts v1.5.0-p1 for 3 input sorted bam files:

//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 3 BAM files 1 unknown file ||
|| ? CORE ||
|| P sorted1.bam ||
|| P sorted2.bam ||
|| P sorted3.bam ||
|| ||
|| Output file : /out/ALL.featureCounts.txt ||
|| Annotations : /ref/All_assembled.merged.gtf ||
|| Assignment details : <input_file>.featureCounts ||
|| ||
|| Threads : 6 ||
|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Strand specific : inversed ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : not counted ||
|| Read orientations : fr ||
|| ||
|| Chimeric reads : counted ||
|| Both ends mapped : not required ||
|| ||

Although the output summary file shows the statistics for all 3 files as expected, the output featureCounts.txt file shows the counts for only the first 2 files, and the names of the last 2 files are not separated by tab in the top row:

Geneid Chr Start End Strand Length sorted1.bam sorted2.bamsorted3.bam
ENSG00000278066 KI270731.1 26533 27138 - 606 0 0
ENSG00000277374 KI270750.1 148668 148843 + 176 0 0
ENSG00000273532 KI270721.1 51722 51792 + 71 0 0
ENSG00000276351 KI270721.1 52666 52734 + 69 0 0
ENSG00000275661 KI270721.1 52895 53010 + 116 0 0
ENSG00000277856 KI270726.1 26241 26534 + 294 0 0
ENSG00000275063 KI270726.1;KI270726.1 41444;41572 41489;41876 +;+ 351 0 0
ENSG00000275987 KI270713.1 30437 30580 - 144 0 0
ENSG00000277475 KI270713.1 31698 32528 - 831 1 0
ENSG00000268674 KI270713.1 35407 35916 + 510 0 0

Could you please let me know how I can get the featureCounts.txt summarizing all 3 input files?

Many thanks!
Best wishes,
Yuqia

Originally posted by shi View Post

Dear All,

I would like to formally introduce to you our featureCounts program, a software program we developed for summarizing the next-gen sequencing reads to genomic features such as genes, exons and promoters.

featureCounts is a light-weight read counting program written entirely using the C programming language. It can be used to count both gDNA-seq and RNA-seq reads for genomic features. It has the following features:
(1) It carries out precise and accurate read assignments by taking care of indels, junctions and fusions in the reads.
(2) It takes less than 4 minutes to summarize 20 million pairs of reads to 26k RefSeq genes using one thread, and only uses 40MB of memory (you can run it on a Mac laptop).
(3) It supports multi-threaded running, making it extremely fast for summarizing large datasets.
(4) It supports GTF format annotation and SAM/BAM read data.
(5) It supports strand-specific read summarization.
(6) It can perform read summarization at both feature level (eg. exons) and meta-feature level (eg. genes).
(7) It allows users to specify whether reads overlapping with more than one feature should be counted or not.
(8) It gives users full control on the summarization of paired-end reads, including allowing them to check if both ends are mapped and/or if the paired-end distances satisfy the distance criteria.
(9) It discriminates the features, which were overlapped by both ends from the same fragment, from those which were overlapped by only one end so as to get more fragments counted.
(10) It allows users to specify whether chimeric fragments should be counted.

For a quick start, have a look at our short tutorial - http://bioinf.wehi.edu.au/featureCounts/ . For more details, please refer to the users guide - http://bioinf.wehi.edu.au/featureCounts/usersguide.pdf (see Chapter 6).

We also compared featureCounts with other methods. The comparison results can be found in our manuscript - http://arxiv.org/abs/1305.3347.

The featureCounts program is part of the Subread package (http://subread.sourceforge.net), which includes a suite of programs for processing next-gen sequencing data such as read mapping and exon-exon junction detection. featureCounts can also be accessed from the development version of the Bioconductor R package Rsubread (http://bioconductor.org/packages/2.1.../Rsubread.html)

Please do not hesitate to contact me if you have any questions ([email protected]).

Best regards,
-------------------
Wei Shi, Ph.D
Bioinformatics Division
The Walter and Eliza Hall Institute of Medical Research
1G Royal Parade, Parkville, Victoria 3052
Australia

**Yuqia** · 02-28-2018, 03:19 AM

Missing output for multiple input bam files in featureCounts

Dear Wei Shi,

I'm using featureCounts v1.5.0-p1 to get a count summary of multiple input sorted bam files. First I tried with 3 files. I'm having trouble getting the correct output.

Here's my code:

featureCounts -T 6 -p -s 2 -t exon -g gene_id \
-a /ref/All_assembled.merged.gtf \
-o /out/ALL.featureCounts.txt \
-R CORE \
sorted1.bam sorted2.bam sorted3.bam

The setting is in the attached file "featureCounts_setting".

The summary (attached file "summary_correct") shows the statistics for all 3 files.
But the ALL.featureCounts.txt (attached file Count_error) shows the output for only 2 files, and the names of the 2 last input files (sorted2.bam and sorted3.bam) in this count file are not tab delimited like the ones in the "summary_correct".

Could this be the bug in this version? If not, could you please let me know how I could get all outputs for all input files?

Many thanks!
Best regards,
Yuqia

Attached Files

**rookie_genomics** · 03-07-2019, 02:53 PM

Hi,

I am new to RNA sequencing analysis and just finished assigning my reads using featureCounts. I am trying to save the output of featureCounts into a txt file and am having trouble.

The command I used for fc is as follows

counts <- featureCounts("my.bam",annot.ext ="my.gtf",isGTFAnnotationFile=T,GTF.featureType="exon",GTF.attrType="gene_id",nthreads=4,isPairedEnd=T,countMultiMappingReads=T)

And I used this command to tabulate the results

write.table(cbind(counts$annotation[,2:4], counts$counts),"sample_featureCounts.txt",quote=F,sep=" ",row.names=F)

My output file has only 4 columns

Chromosome(Ensembl ID) Start End Counts

I would like to add gene name to the output to identify counts better.

How can I do that? What command should I use?

**yangliao** · 03-07-2019, 03:39 PM

Originally posted by rookie_genomics View Post

Code:

write.table(cbind(counts$annotation[,2:4], counts$counts),"sample_featureCounts.txt",quote=F,sep=" ",row.names=F)

You saved the 2nd to the 4th columns in the annotation data frame into the file. These three columns are the chromosome names, start and end locations, as you had in the output.

The gene identity is in the first column of annotation, so you can use

Code:

write.table(cbind(counts$annotation[,1:4], counts$counts),"sample_featureCounts.txt",quote=F,sep=" ",row.names=F)

**Guillaume** · 04-09-2019, 06:34 AM

COunt inconsistency

Hello

I have problems getting correct counts for a stranded RNA-seq study using hisat2 version 2.1.0 and featureCounts v1.6.4, and I would appreciate any help.

You have below an image of the results of the mapping by Hisat2 (--rna-strandness RF) of a small test subset of my data

For example, the second last gene (AFBG1_15566.1) is only covered by transcripts of the same orientation (blue reads).

However, when I process the same BAM file with featureCounts, it finds
with the flag -s 1:

AFBG1_15566 scf7180000002748;scf7180000002748 13827;16161 16084;16780 +;+ 2878 2383

with the flag -s 2:
AFBG1_15566 scf7180000002748;scf7180000002748 13827;16161 16084;16780 +;+ 2878 2257

I can't understand why featureCounts finds about the same counts (2383 and 2257) in both orientations.

if that can be useful to find the solution, I have put the script and data that allow to get these results at this address: https://nuage.osupytheas.fr/s/D5S4stT9aDLYEsD

Thank you !

**yangliao** · 04-09-2019, 02:12 PM

Hi Guillaume,

I found that you did not use the "-p" option in your script to run featureCounts. This means that each single read, not read-pair, was assigned to the genes. FeatureCounts only flips the strand of the second read when the "-p" option is specified; otherwise it simply looks for the 0x10 FLAG in the alignment for matching the strands of the gene and the alignment. Half of your single reads (R2s) were from the positive strand, while the other half (R1s) were from negative strand, hence the AFBG1_15566 gene always has counts no matter you used "-s 1" or "-s 2".

I changed your script by adding the "-p" option to featureCounts, flipping all your R2s to the negative strand. Now using "-s 1" has zero count for AFBG1_15566, but using "-s 2" has all counts for AFBG1_15566.

Cheers,
Yang

**Guillaume** · 04-10-2019, 04:38 AM

Thank you so much Yang
I should have read the manual more carefully...

**mazoilya** · 02-20-2020, 04:21 PM

We are using FeatureCounts quite a bit. Great tool. I have noticed that running it with the options "-t exon -g gene_id " gives slightly different results from "-t gene -f -g gene_id" and yet different from "-t gene -g gene_id", though they all should produce the counts at the gene level. I am wondering how the execution behind these options is different?

**biswasr0** · 04-21-2021, 07:32 PM

ror: the feature on the 21091-th line has zero coordinate or zero lengths

Hi ALL,
can you please help to get pass of this error when trying to get featureCounts for genes out of my gtf file.
error -

ror: the feature on the 21091-th line has zero coordinate or zero lengths

No counts were generated.

i am running as-

featureCounts(files= "myBAM.BAM",
isPairedEnd=TRUE,requireBothEndsMapped=TRUE,
annot.ext="GCF_000009705.1_ASM970v1_genomic.gtf.gz",
isGTFAnnotationFile=TRUE,GTF.featureType="gene",GTF.attrType="gene_id", nthreads = 20,
genome = "GCF_000009705.1_ASM970v1_genomic.fna" )

pelase help
with regards
Thank you

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News