Seqanswers Leaderboard Ad

**shi** · 02-09-2014, 06:01 PM

Originally posted by sindrle View Post

Thank you, you have been very helpful.

Im currently struggling with how to implement featureCounts to DEXseq, its much easier to use HTSeq, since its described in the manual.

I really want to be able to use featureCounts, since with HTSeq I had to "hack" the python script to work with my annotation file, whereas I don't need that with featureCounts.

Also, I want to be able, in the end, to present expression values of transcripts as counts/library size/gene length, which also is easier with featureCounts, since it outputs gene length!

Could you elaborate a bit the difficulty you encountered with using featureCounts output with dexseq? Basically, featureCounts just outputs a read count table and it can be readily used by downstream DE analysis programs such as limma/voom and edgeR. The analysis can be performed at both gene level and exon level with these programs.

**sindrle** · 02-09-2014, 06:21 PM

Its only because me, and maybe more, do not have a background in programming at all.

Its very simple to follow an explained guide, but with FeatureCounts I have to read and search for every step I take. Taking time which may be saved using HTSeq.

**roryk** · 03-18-2014, 11:12 AM

Hi Wei,

featureCounts is awesome-- thanks for writing such nice software. We use it in the bcbio-nextgen pipeline for generating counts and it is insanely fast.

We have encountered this error on some alignment files when running featureCounts, version 1.4.3-p:

*** glibc detected *** /n/hsphS10/hsphfs1/chb/local/bin/featureCounts: double free or corruption (!prev): 0x0d7312f0 ***
======= Backtrace: =========
[0x80cfb0c]
[0x80d2e3b]
[0x808e405]
======= Memory map: ========
08048000-08133000 r-xp 00000000 00:20 366245842982 /net/hsphfs1/srv/export/hsphfs1/share_root/chb/local/bin/featureCounts
08133000-08134000 rw-p 000eb000 00:20 366245842982 /net/hsphfs1/srv/export/hsphfs1/share_root/chb/local/bin/featureCounts
08134000-08138000 rw-p 00000000 00:00 0
084e6000-0d744000 rw-p 00000000 00:00 0 [heap]
55555000-55556000 r-xp 00000000 00:00 0 [vdso]
55556000-55855000 rw-p 00000000 00:00 0
55857000-5644c000 rw-p 00000000 00:00 0
5644c000-57333000 rw-p 00000000 00:00 0
57333000-58391000 rw-p 00000000 00:00 0
58400000-58421000 rw-p 00000000 00:00 0
58421000-58500000 ---p 00000000 00:00 0
589f7000-589f8000 ---p 00000000 00:00 0
589f8000-59f83000 rw-p 00000000 00:00 0
fff96000-fffac000 rw-p 00000000 00:00 0 [stack]

The command is:

/n/hsphS10/hsphfs1/chb/local/bin/featureCounts -a /net/hsphfs1/srv/export/hsphfs1/share_root/chb/biodata/genomes/Mmusculus/mm10/rnaseq/ref-transcripts.gtf -o /net/hsphfs1/srv/export/hsphfs1/share_root/chb/projects/ljg_cardiac_rnaseq/project/work/htseq-count/tx/tmpJu9kkp/CRUSH2673-active.counts -s 0 -p -B -C /net/hsphfs1/srv/export/hsphfs1/share_root/chb/projects/ljg_cardiac_rnaseq/project/work/align/CRUSH2673-active/5_2014-03-12_project_star/CRUSH2673-active.sorted.bam

The header looks fine and the BAM and GTF files are also all good. Any thoughts?

**shi** · 03-19-2014, 02:53 PM

We have just released a new version (1.4.4). Please try that one to see if the problem persists.

Cheers,
Wei

**also.casey** · 04-03-2014, 09:01 AM

Issues using ensembl (version 75) GTF

I'm attempting to evaluate featureCounts to summarize a collection of bam files from an RNA-Seq experiment aligned to ensembl sequences. FASTA and GTF files for ensemble transcripts were obtained from the ensembl FTP site, modified by appending ERCC sequences and descriptors, respectively, and aligned using bwa-mem.

featureCounts happily processes the resulting bam files when using the ERCC GTF info only (and count totals seem to match that obtained using summarizeOverlaps in R), however using the ensembl GTF alone results in 0 "successfully assigned reads", while the combined ERCC and ensembl transcript GTF file results in only ERCC mapping reads being quantified (i.e. all ensembl transcripts appear in the output file, but all 0 counts). I think the issue may be because of some sort of formatting issue in the ensembl GTF? Can anyone shed any light on this issue?

The head of ensembl gtf I'm using is attached.

Thanks!

Attached Files

ensembl.txt (2.3 KB, 37 views)

**roryk** · 04-03-2014, 09:09 AM

Originally posted by also.casey View Post

I'm attempting to evaluate featureCounts to summarize a collection of bam files from an RNA-Seq experiment aligned to ensembl sequences. FASTA and GTF files for ensemble transcripts were obtained from the ensembl FTP site, modified by appending ERCC sequences and descriptors, respectively, and aligned using bwa-mem.

featureCounts happily processes the resulting bam files when using the ERCC GTF info only (and count totals seem to match that obtained using summarizeOverlaps in R), however using the ensembl GTF alone results in 0 "successfully assigned reads", while the combined ERCC and ensembl transcript GTF file results in only ERCC mapping reads being quantified (i.e. all ensembl transcripts appear in the output file, but all 0 counts). I think the issue may be because of some sort of formatting issue in the ensembl GTF? Can anyone shed any light on this issue?

The head of ensembl gtf I'm using is attached.

Thanks!

Is your combined FASTA file or the index you made for bwa-mem messed up? You would get this behavior if the reads were not aligned to the genome at all, only the ERCC sequences.

**also.casey** · 04-03-2014, 10:32 AM

Thanks for your reply roryk. I checked that reads were aligned using:

samtools view foo.bam | awk '$3 != "*"' | head -1

which returns:

"HWI-ST913:209:C2V7JACXX:6:1101:1587:2173 83 ENST00000394020 4460 0 58M = 4417 -101 CCTTATAGTTCTCTGCACACAATAGGTGCTTCTTGGATGTTCTTGGGTTTGGAAATAA IGHGGCCFHHFCHIIHIGJIJIJIHHAEB3EFD<CGGIJIIIIIHGIGGIHHIHIGJH NM:i:0 MD:Z:58 AS:i:58 XS:i:58"

Counting reads mapped to reference sequence names such as the one above in the example bam files using:

samtools view foo.bam | awk '$3 != "*"' | wc -l

indicates ~58% of reads were mapped in total, with ~3.5% mapping to ERCC reference sequences. Reads were aligned, but aren't being counted by featureCounts it seems.

**roryk** · 04-03-2014, 10:40 AM

The ensemble file you posted are chromosome coordinates but you aligned to a FASTA file of transcript sequences, not the whole chromosomes.

**also.casey** · 04-03-2014, 11:03 AM

Can I modify the linked gtf in any way to allow for summarization of the mapped reads as aligned?

**roryk** · 04-03-2014, 11:08 AM

You could just count the third column in your SAM file. Aligning to the whole chromosomes is what you really want to do though.

**ndaniel** · 04-29-2014, 04:10 AM

If this is used for counting reads mapping on genes using Ensembl GTF file:

featureCounts \
-a organism.gtf \
-t exon \
-g gene_id \
-o counts_per_gene.txt \
accepted_hits.sorted.bam

then how should be counted the reads mapping on exons using the same Ensembl GTF file?

Is it something like this?
featureCounts \
-a organism.gtf \
-t exon \
-f exon \
-g exon_id \
-o counts_per_exons.txt \
accepted_hits.sorted.bam

???

**shi** · 04-29-2014, 03:07 PM

Replacing "-f exon" with just "-f" will instruct featureCounts to perform feature (eg exon) level summarization. The output will include number of reads assigned to each exon.

You may also want to turn on "-O" option, which will instruct featureCounts to assign a read to all its overlapping exons. This is often useful for exon-level read counting.

Wei

**ndaniel** · 04-29-2014, 10:48 PM

featureCounts for counting reads mapping on exons

Originally posted by shi View Post

Replacing "-f exon" with just "-f" will instruct featureCounts to perform feature (eg exon) level summarization. The output will include number of reads assigned to each exon.

You may also want to turn on "-O" option, which will instruct featureCounts to assign a read to all its overlapping exons. This is often useful for exon-level read counting.

Wei

Thanks!

Is this:

featureCounts \
-a organism.gtf \
-t exon \
-f \
-g exon_id \
-O \
-o counts_per_exons.txt \
accepted_hits.sorted.bam

or this (the difference is "-g gene_id" and "-g exon_id" ?

featureCounts \
-a organism.gtf \
-t exon \
-f \
-g gene_id \
-O \
-o counts_per_exons.txt \
accepted_hits.sorted.bam

Also, what are the command line parameters for running featureCounts for counting reads mapping on exons from command line in order to produce exactly the same output as this R version of featureCounts as shown below (taken from limma user guide which was updated last time on April 12, 2014, page 127):

fc_SE <- featureCounts(SE_bam_files, annot.ext="unique.gff",
+ isGTFAnnotationFile=TRUE,
+ GTF.featureType="exon", GTF.attrType="ID",
+ useMetaFeatures=FALSE, allowMultiOverlap=TRUE)

**shi** · 04-30-2014, 03:11 PM

Originally posted by ndaniel View Post

Thanks!

Is this:

featureCounts \
-a organism.gtf \
-t exon \
-f \
-g exon_id \
-O \
-o counts_per_exons.txt \
accepted_hits.sorted.bam

or this (the difference is "-g gene_id" and "-g exon_id" ?

featureCounts \
-a organism.gtf \
-t exon \
-f \
-g gene_id \
-O \
-o counts_per_exons.txt \
accepted_hits.sorted.bam

Typically this should be "gene_id", but you may check your GTF annotation file to make sure it is correct (the 9th column).

Also, what are the command line parameters for running featureCounts for counting reads mapping on exons from command line in order to produce exactly the same output as this R version of featureCounts as shown below (taken from limma user guide which was updated last time on April 12, 2014, page 127):

fc_SE <- featureCounts(SE_bam_files, annot.ext="unique.gff",
+ isGTFAnnotationFile=TRUE,
+ GTF.featureType="exon", GTF.attrType="ID",
+ useMetaFeatures=FALSE, allowMultiOverlap=TRUE)

This should give you the same results as you got from sourceforge featureCounts. You need to adapt this R command to make it work for your data though. Also, please make sure you are using Rsubread 1.14.0 and sourceforget Subread 1.4.4 (these two are matched versions), otherwise you may get slightly different results.

**ndaniel** · 05-15-2014, 03:52 AM

how exons/features are treated?

In case that one counts the reads mapping on exons, how does featureCounts treats the exons/feature which overlap (and they belong to the same gene)?

More exactly, if there are two exons which belong to the same gene such that:
(i) each exon belongs to different isoforms of the same gene, and
(ii) both exons have the same start position on the genome, and
(iii) second exon is longer than the first one
how the reads count for these exons is reported? Is reported the count of reads which (i) map on the intersection of these two exons, or (ii) map on the longest exon in this case (because the longest exons contains also the shortest one)? Are there reported two counts of reads for each exon?

Here is how the featureCounts is run for these questions (i.e. counting the reads mapping on exons)
# Program:featureCounts v1.4.4; Command:"featureCounts" "-T" "4" "-a" "/ensembl.gtf" "-t" "exon" "-f" "-g" "gene_id" "-O" "-o" "/output$
Geneid Chr Start End Strand Length /accepted_hits.sorted.bam
ENSG00000223972 1 11869 12227 + 359 0
ENSG00000223972 1 12613 12721 + 109 0
ENSG00000223972 1 13221 14409 + 1189 4
ENSG00000223972 1 11872 12227 + 356 0
ENSG00000223972 1 12613 12721 + 109 0
ENSG00000223972 1 13225 14412 + 1188 4
ENSG00000223972 1 11874 12227 + 354 0
ENSG00000223972 1 12595 12721 + 127 0
ENSG00000223972 1 13403 13655 + 253 3
ENSG00000223972 1 13661 14409 + 749 1
ENSG00000223972 1 12010 12057 + 48 0
ENSG00000223972 1 12179 12227 + 49 0
...

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News