Unconfigured Ad

**dietmar13** · 05-16-2013, 12:16 AM

dear wei,

using featurecount I
get:

Code:

./count_featurCount.sh: line 15: 29340 Segmentation fault      (core dumped) featureCounts -p -B -C -a $gtf -t exon -g gene_id -i $SAMdir/$name/RUM.sam -o $name.fcount

I use:

Code:

featureCounts -p -B -C -a $gtf -t exon -g gene_id -i $SAMdir/$name/RUM.sam -o $name.fcount

example sam file:

Code:

seq.1	77	*	0	0	*	=	0	0	CGGGGCGGGATCCCGCCGCCTCTCCGGCCCGCCGGNNNNCCGCCACCGGCCCACTNTNCNCNNCCCCCCCNCGCGG	############################################################################
seq.1	141	*	0	0	*	=	0	0	NNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNTNNCNGNNTCNNNNNNNNNN	############################################################################
seq.2	73	chr2	89157089	25	76M	=	89157089	0	CCTCTCTGGGATAGAAGTTATTCAGCAGGCACACANNNNAGGCAGTTCCAGATTTNANCTGNNCATCAGANGGCGG	CCCCCCCDBCCCCBCAABBBCCBCCCCCCCBB@AA####/00000CCCCA##########################	XO:A:F	MD:Z:35ACAG16C1A3CT7T5	NM:i:9	IH:i:1	HI:i:1
seq.2	133	chr2	89157089	25	*	=	89157089	0	NNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNCNCCNCNNTGCNNNNNNNNN	############################################################################	XO:A:F	IH:i:1	HI:i:1
seq.3	77	*	0	0	*	=	0	0	CGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGTNNNGCAGGAATGCCGAGACCGCNTGTANTCTCGTATGCGGG	BB>4B?>?3>@>@642::<7A#######################################################
seq.3	141	*	0	0	*	=	0	0	NNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNNNNNNNNNNNNCNTGNCGNGCGNNNNNNNNN	############################################################################
seq.4	77	*	0	0	*	=	0	0	CGGTTCACCAGGAATGCCGAGACCGGAAGAGCGGTTCNGCAGGAATGCCGAGACCGCTCGTAATCTCGCATACCGG	<4=.+;**1*45052@@@@@########################################################
seq.4	141	*	0	0	*	=	0	0	CNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNGNCGNGGNGAGNNNNNNNNN	############################################################################

and example of the gtf-file (from gencode):
##description: evidence-based annotation of the human genome (GRCh37), version 15 (Ensembl 70)

Code:

##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2013-01-21
chr1	HAVANA	gene	11869	14412	.	+	.	gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1	HAVANA	transcript	11869	14409	.	+	.	gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1	HAVANA	exon	11869	12227	.	+	.	gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1;  level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1	HAVANA	exon	12613	12721	.	+	.	gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 2;  level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1	HAVANA	exon	13221	14409	.	+	.	gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 3;  level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1	ENSEMBL	transcript	11872	14412	.	+	.	gene_id "ENSG00000223972.4"; transcript_id "ENST00000515242.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1-201"; level 3; havana_gene "OTTHUMG00000000961.2";
chr1	ENSEMBL	exon	11872	12227	.	+	.	gene_id "ENSG00000223972.4"; transcript_id "ENST00000515242.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1-201"; exon_number 1;  level 3; havana_gene "OTTHUMG00000000961.2";

do you have any clue what could be wrong?

best wishes,

dietmar

**shi** · 05-16-2013, 03:18 AM

Dear dietmar,

Thanks for providing the detailed information about the problem you encountered. This helped us to reproduce the problem.

The featureCounts program does not allow ## lines or comment lines to be included in the provided annotation file. The program crashed when such lines exist in the annotation file. Sorry about this. We are now working on this and will let you know once it is fixed.

In the meantime, you can get around this problem by removing those ## lines from your annotation file. I have tested this and it worked. Hope it will work for you. But please let me know if it doesn't. I will get back to you soon.

Best regards,
Wei

**shi** · 05-16-2013, 10:37 PM

Hi dietmar,

We have fixed this and uploaded the latest version (v1.3.3-p1) to sourceforge (http://subread.sourceforge.net).

Hope this works for you.

Best wishes,
Wei

**dietmar13** · 05-19-2013, 06:19 AM

dear wei,

thank you.

but now i get another error:

Code:

**********
**********
  WARNING
**********
No meta-feature id is found on the 6-th line. If it is a GTF file, you may need to check the name of the gene_id field and specify a correct field name using a '-g' option.
**********
**********
WARNING: the feature on the 31178-th line has zero coordinate or zero lengths
... many similar warnings ...
WARNING: the feature on the 2577002-th line has zero coordinate or zero lengths
There are 1201574 features loaded from the annotation file.
The 1201574 features are sorted.
Number of chromosomes included in the annotation is 25
./count_featurCount.sh: line 16: 24368 Segmentation fault      (core dumped) featureCounts -p -B -C -a $gtf -t exon -g gene_id -i $SAMdir/$name/RUM.sam -o $name.fcount

the 6th line has definitely a gene_id (probably a bug?) but the lines with zero length features are true.

the gtf-file:

Code:

##description: evidence-based annotation of the human genome (GRCh37), version 15 (Ensembl 70)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2013-01-21
chr1	HAVANA	gene	11869	14412	.	+	.	gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1	HAVANA	transcript	11869	14409	.	+	.	gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
...
31178-th line:
chr1	HAVANA	exon	19983359	19983359	.	+	.	gene_id "ENSG00000158747.9"; transcript_id "ENST00000439278.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "NBL1"; transcript_type "protein_coding"; transcript_status "NOVEL"; transcript_name "NBL1-007"; exon_number 4;  level 1; tag "mRNA_end_NF"; tag "cds_end_NF"; tag "exp_conf"; havana_gene "OTTHUMG00000002700.5"; havana_transcript "OTTHUMT00000313743.1";

do you think the zero-length features are the problem, or is it possible that if the BAM files has additional chromosomes (like chr1_gl000191_random) which are not present in the gtf-annotation file your program has a problem. i think this should be catched, because this will happen often...

best wishes,

dietmar

**shi** · 05-19-2013, 03:43 PM

Dear dietmar,

Although you got those warning messages, your annotation file was actually successfully processed according to the program output (sorry for the misleading warning message about gene id, we have fixed this and updated the program on sourceforge).

Did you provide a SAM or BAM file to featureCounts? If you provided a BAM file, you need to add an '-b' option to your command. Otherwise it will cause a segmentation fault.

We actually downloaded the ENCODE annotation file you used and tested it here. We found it worked with the version of featureCounts you are using for both SAM and BAM input.

Could you please check if you correctly specified the type of your read file? I think this was probably the reason why you got a segmentation fault.

Best regards,

Wei

**NateP** · 05-19-2013, 04:21 PM

I don't know if this an intentional limitation to the program, but here is an error I'm encountering.

The reference genome I am aligning reads to is a draft version, with 430k contigs.

The error I am getting when attempting to use featureCounts is:

"There are 290343 features loaded from the annotation file.
WARNING: THere are too many chromosomes in the annotations. The remainder of the annotation file is ignored.
The 8586 features are sorted.
Number of chromosomes inclued in the annotation is 499.

WARNING: There are too many reference sequences in the BAM file!
[repeated many times]
Segmentation fault"

**shi** · 05-19-2013, 10:38 PM

Dear @NateP,

featureCounts had hard-coded limits on the numbers of chromosome allowed to be included in the annotation file and in the BAM/SAM file. These limits were far less than the number of contigs you have here and that was the reason why featureCounts crashed.

We have now removed these limits and allowed any number of chromosomes (or contigs) to be included in the annotation file and in the SAM/BAM file. Please check out the latest version of the Subread package (1.3.3-p2) on sourceforge. The featureCounts program included in this version should work for your data now.

Please let me know if the problem persists.

Best wishes,

Wei

**NateP** · 05-20-2013, 06:53 AM

Wei,

I figured it was something along those lines.
The newest version works with my data sets, thanks for the great program!

**shi** · 05-20-2013, 02:41 PM

Thanks for letting me know, NateP.

Cheers,
Wei

**Nicolas Nalpas** · 05-21-2013, 01:43 AM

Dear Wei,
I was just wondering if you could give advise on a featureCounts job that I tried to perform, I have RNA-seq data aligned to bovine genome (UMD3.1.71 from ENSEMBL) and I used subread-1.3.3-p2 (which is the latest version I think).
The error I get is:

Code:

No meta-feature id is found on the 1-th line. If it is a GTF file, you may need to check the name of the gene_id field and specify a correct field name using a '-g' opt
ion.

My code is:

Code:

featureCounts -a /path/annotation.gtf -t exon -g gene_id -b -i /path/file.bam -o file.out -s -R -p -B

And if I understand correctly the outputs I obtained, the reads assignment was still performed against my feature (exon in this case, represented as LINE_*), however I thought the output should have been my counts per gene_id:

Code:

geneid  length  nreads
LINE_0000001    88      0
LINE_0000002    167     0
LINE_0000003    51      0
LINE_0000004    337     0
LINE_0000005    1399    0
LINE_0000006    107     0
LINE_0000007    110     0
LINE_0000008    110     0
LINE_0000009    133     0
LINE_0000010    169     0
LINE_0000011    204     0
LINE_0000012    211     0
LINE_0000013    229     0
LINE_0000014    45      0
LINE_0000015    98      0
LINE_0000016    183     0
LINE_0000017    139     0
LINE_0000018    1429    0
LINE_0000019    218     0
LINE_0000020    91      0
LINE_0000021    202     0
LINE_0000022    142     2
LINE_0000023    157     6

My gtf file looks like this:

Code:

4       protein_coding  exon    432160  432247  .       -       .        gene_id "ENSBTAG00000003625"; transcript_id "ENSBTAT00000004727"; exon_number "1"; gene_name "BT.66099"; gene_biotype "protein_coding"; transcript_name "BT.66099-201"; exon_id "ENSBTAE00000393952";
4       protein_coding  CDS     432160  432235  .       -       0        gene_id "ENSBTAG00000003625"; transcript_id "ENSBTAT00000004727"; exon_number "1"; gene_name "BT.66099"; gene_biotype "protein_coding"; transcript_name "BT.66099-201"; protein_id "ENSBTAP00000004727";
4       protein_coding  start_codon     432233  432235  .       -       0        gene_id "ENSBTAG00000003625"; transcript_id "ENSBTAT00000004727"; exon_number "1"; gene_name "BT.66099"; gene_biotype "protein_coding"; transcript_name "BT.66099-201";
4       protein_coding  exon    430223  430389  .       -       .        gene_id "ENSBTAG00000003625"; transcript_id "ENSBTAT00000004727"; exon_number "2"; gene_name "BT.66099"; gene_biotype "protein_coding"; transcript_name "BT.66099-201"; exon_id "ENSBTAE00000037981";
4       protein_coding  CDS     430223  430389  .       -       2        gene_id "ENSBTAG00000003625"; transcript_id "ENSBTAT00000004727"; exon_number "2"; gene_name "BT.66099"; gene_biotype "protein_coding"; transcript_name "BT.66099-201"; protein_id "ENSBTAP00000004727";
4       protein_coding  exon    429103  429153  .       -       .        gene_id "ENSBTAG00000003625"; transcript_id "ENSBTAT00000004727"; exon_number "3"; gene_name "BT.66099"; gene_biotype "protein_coding"; transcript_name "BT.66099-201"; exon_id "ENSBTAE00000369947";
4       protein_coding  CDS     429103  429153  .       -       0        gene_id "ENSBTAG00000003625"; transcript_id "ENSBTAT00000004727"; exon_number "3"; gene_name "BT.66099"; gene_biotype "protein_coding"; transcript_name "BT.66099-201"; protein_id "ENSBTAP00000004727";
4       protein_coding  exon    426030  426366  .       -       .        gene_id "ENSBTAG00000003625"; transcript_id "ENSBTAT00000004727"; exon_number "4"; gene_name "BT.66099"; gene_biotype "protein_coding"; transcript_name "BT.66099-201"; exon_id "ENSBTAE00000037983";

Any idea on how I can sort this out?
Thanks a lot for your help.
Regards,
Nicolas

**iris_aurelia** · 05-21-2013, 02:11 AM

I tried subread-align and featureCount as well and encountered the same 'problem'. I was hoping that it would give me a gene identifier in the output file instead of line numbers. I've also used the -t and the -g option.

**shi** · 05-21-2013, 03:07 AM

Dear Nicolas and iris_aurelia,

Please replace -g gene-id with -g ' gene_id' in your command (note that there is a space right before gene_id in the new argument) and try it again.

I suspect the problem you encountered was because there is a space right before the gene_id attribute in the 9th column of your annotation file. I have seen this for the yeast GTF annotation file generated by Ensembl. This is not compliant with the GTF-format specification which requires all the columns to be tab-delimited.

However, the human GTF annotation generated by Ensembl is fine. So it seems this problem only occurred for some species. I'm going to contact Ensembl to let them know this issue.

Please let me know if the new argument does not work.

Best wishes,

Wei

**iris_aurelia** · 05-21-2013, 03:56 AM

Hi Wei,

Thanks, this solution works! However, I was using the Human reference (version 66), which had this problem as well. Maybe newer versions of the GTF file do not have this problem..?

Another thing I bumped into was the lack of reading gzipped fastQ files. Is it possible to either read from standard in or to create an option to use gzipped files?

Iris

**Nicolas Nalpas** · 05-21-2013, 05:27 AM

Hi Wei,

Thanks a lot for all your help, indeed it works perfectly now and so fast.
This is great.
Thanks again,
Nicolas

Topics	Statistics	Last Post
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM

Unconfigured Ad

featureCounts: a universal read summarization program

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News