Hi,
I have a problem with featureCounts gtf file.
I want to count the miRNA reads using the sorted .bam file containing only mapped reads (all generated by samtools from Bowtie output --SAM file).
For mapping I used the H. sapiens, NCBI v37 indexes, downloaded from bowtie homepage
To get the gtf file for miRNA I used:
the generate file looks like that
in featureCounts (Subread package) I use:
...it reads in the .bam file, but does not recognize the gtf of some reason.
Error message:
so, questions
1) is my featureCounts code OK to find the miRNAs? In the .gtf file there are some transcripts, which are with a gene_biotype "miRNA", but he transcript_name is stated based on the gene name, rather than "MIR" (see above the last row of gtf file). Can someone explain what these are?
2) why doesn't featureCount recognize the gtf? Any suggestions to solve the problem?
Thanks in advance for all the repliers
I have a problem with featureCounts gtf file.
I want to count the miRNA reads using the sorted .bam file containing only mapped reads (all generated by samtools from Bowtie output --SAM file).
For mapping I used the H. sapiens, NCBI v37 indexes, downloaded from bowtie homepage
To get the gtf file for miRNA I used:
Code:
$ wget ftp://ftp.ensembl.org/pub/release-74/gtf/homo_sapiens/Homo_sapiens.GRCh37.74.gtf.gz $ gunzip Homo_sapiens.GRCh37.74.gtf.gz $ cat Homo_sapiens.GRCh37.74.gtf.gz | grep "miRNA" > Homo_sapiens_miRNA.gtf
Code:
$ head Homo_sapiens_miRNA.gtf 1 miRNA exon 30366 30503 . + . gene_id "ENSG00000243485"; transcript_id "ENST00000607096"; exon_number "1"; gene_name "MIR1302-11"; gene_biotype "lincRNA"; transcript_name "MIR1302-11-201"; exon_id "ENSE00003695741"; 1 miRNA exon 1102484 1102578 . + . gene_id "ENSG00000207730"; transcript_id "ENST00000384997"; exon_number "1"; gene_name "MIR200B"; gene_biotype "miRNA"; transcript_name "MIR200B-201"; exon_id "ENSE00001500004"; 1 miRNA exon 1103243 1103332 . + . gene_id "ENSG00000207607"; transcript_id "ENST00000384875"; exon_number "1"; gene_name "MIR200A"; gene_biotype "miRNA"; transcript_name "MIR200A-201"; exon_id "ENSE00001499882"; 1 miRNA exon 1104385 1104467 . + . gene_id "ENSG00000198976"; transcript_id "ENST00000362106"; exon_number "1"; gene_name "MIR429"; gene_biotype "miRNA"; transcript_name "MIR429-201"; exon_id "ENSE00001436869"; 1 miRNA exon 3477259 3477354 . - . gene_id "ENSG00000207776"; transcript_id "ENST00000385042"; exon_number "1"; gene_name "MIR551A"; gene_biotype "miRNA"; transcript_name "MIR551A-201"; exon_id "ENSE00001500049"; 1 miRNA exon 3800628 3800697 . + . gene_id "ENSG00000264428"; transcript_id "ENST00000579705"; exon_number "1"; gene_name "AL691523.1"; gene_biotype "miRNA"; transcript_name "AL691523.1-201"; exon_id "ENSE00002727084";
Code:
featureCounts -a Homo_sapiens_miRNA.gtf -F GTF -t transcript_name -M -o sample_1_counts.txt input_file sample_1_mapped.bam
Error message:
Code:
Load annotation file Homo_sapiens_miRNA.gtf ... || || Features : 0 || || WARNING no features were loaded in format GTF. || || annotation format can be specified using '-F'. || Failed to open the annotation file Homo_sapiens_miRNA.gtf, or its format is incorrect, or it contains no 'transcript_name' features.
so, questions
1) is my featureCounts code OK to find the miRNAs? In the .gtf file there are some transcripts, which are with a gene_biotype "miRNA", but he transcript_name is stated based on the gene name, rather than "MIR" (see above the last row of gtf file). Can someone explain what these are?
2) why doesn't featureCount recognize the gtf? Any suggestions to solve the problem?
Thanks in advance for all the repliers
Comment