Hi,
I have dowloaded UCSC hg19 gtf files from Galaxy
The resulting file looks like that:
$ head hg19_genes.gtf
When obtaining the same file from UCSC Tables, the file looks somewhat different. First 3 rows are similar and thereafter things don't exactly match.....
$ head hg19_ucsc_table.gtf
So, when I grep out miRNA genes from the first gtf file obtained through Galaxy (hg19_genes.gtf ), I get a gtf file with only miRNA genes.
command:
this command, however, does not work for the file I got from UCSC Tables (hg19_ucsc_table.gtf ). I guess the gene_id's are just denoted to contain miR/MIR symbol.
So, my question:
I would like to obtain miRNA gtf files for mm10 in UCSC format (I've used UCSC mm10 for mapping).
In Galaxy, there is only UCSC mm9 gtf file (https://usegalaxy.org/library_common...9490a8b6c89961).
How can I obtain UCSC mm10.gtf in a format, where I could grep out the miRNA genes????
I have dowloaded UCSC hg19 gtf files from Galaxy
The resulting file looks like that:
$ head hg19_genes.gtf
Code:
chr1 unknown exon 11874 12227 . + . gene_id "DDX11L1"; transcript_id "NR_046018_1"; gene_name "DDX11L1"; tss_id "TSS14523"; chr1 unknown exon 12613 12721 . + . gene_id "DDX11L1"; transcript_id "NR_046018_1"; gene_name "DDX11L1"; tss_id "TSS14523"; chr1 unknown exon 13221 14408 . + . gene_id "DDX11L1"; transcript_id "NR_046018_1"; gene_name "DDX11L1"; tss_id "TSS14523"; chr1 unknown exon 14362 14829 . - . gene_id "WASH7P"; transcript_id "NR_024540"; gene_name "WASH7P"; tss_id "TSS7359"; chr1 unknown exon 14970 15038 . - . gene_id "WASH7P"; transcript_id "NR_024540"; gene_name "WASH7P"; tss_id "TSS7359"; chr1 unknown exon 15796 15947 . - . gene_id "WASH7P"; transcript_id "NR_024540"; gene_name "WASH7P"; tss_id "TSS7359"; chr1 unknown exon 16607 16765 . - . gene_id "WASH7P"; transcript_id "NR_024540"; gene_name "WASH7P"; tss_id "TSS7359"; chr1 unknown exon 16858 17055 . - . gene_id "WASH7P"; transcript_id "NR_024540"; gene_name "WASH7P"; tss_id "TSS7359"; chr1 unknown exon 17233 17368 . - . gene_id "WASH7P"; transcript_id "NR_024540"; gene_name "WASH7P"; tss_id "TSS7359"; chr1 unknown exon 17606 17742 . - . gene_id "WASH7P"; transcript_id "NR_024540"; gene_name "WASH7P"; tss_id "TSS7359";
When obtaining the same file from UCSC Tables, the file looks somewhat different. First 3 rows are similar and thereafter things don't exactly match.....
$ head hg19_ucsc_table.gtf
Code:
chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3"; chr1 hg19_knownGene exon 12613 12721 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3"; chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3"; chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1"; chr1 hg19_knownGene exon 12646 12697 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1"; chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1"; chr1 hg19_knownGene start_codon 12190 12192 0.000000 + . gene_id "uc010nxq.1"; transcript_id "uc010nxq.1"; chr1 hg19_knownGene CDS 12190 12227 0.000000 + 0 gene_id "uc010nxq.1"; transcript_id "uc010nxq.1"; chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc010nxq.1"; transcript_id "uc010nxq.1"; chr1 hg19_knownGene CDS 12595 12721 0.000000 + 1 gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
So, when I grep out miRNA genes from the first gtf file obtained through Galaxy (hg19_genes.gtf ), I get a gtf file with only miRNA genes.
command:
Code:
$cat hg19_genes.gtf | grep "MIR" > hg19_miRNA.gtf
So, my question:
I would like to obtain miRNA gtf files for mm10 in UCSC format (I've used UCSC mm10 for mapping).
In Galaxy, there is only UCSC mm9 gtf file (https://usegalaxy.org/library_common...9490a8b6c89961).
How can I obtain UCSC mm10.gtf in a format, where I could grep out the miRNA genes????
Comment