Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dGho
    replied
    Originally posted by thinkRNA View Post
    You will have to convert the ensembl ids to corresponding gene symbols. Check out biomart.

    you can select ensembl gene id and gene symbols and get the file which will help you translate. This will require some programming.

    Thank you so much. This is very helpful.

    Leave a comment:


  • jbrwn
    replied
    oh, i should have specified that my instructions applied to human as i'm not familiar with anything associated with other organisms. my aligned reads come out of tophat as "chr1" and "chrX", which is why i treated my ensemble reference the way i did in my previous reply. i don't know what you'll need to do with NT_166433 or MT. take a look at your reads or wait till someone comes along who's worked with mice.

    Leave a comment:


  • filippos
    replied
    Thank you jbrwn for your answer.
    The first lines of the ensembl GTF that I'm using are:

    NT_166433 protein_coding exon 11955 12166 . + . gene_id "ENSMUSG00000000702"; transcript_id "ENSMUST00000105216"; exon_number "1"; gene_name "AC007307.1"; transcript_name "AC007307.1-201";
    NT_166433 protein_coding CDS 12026 12166 . + 0 gene_id "ENSMUSG00000000702"; transcript_id "ENSMUST00000105216"; exon_number "1"; gene_name "AC007307.1"; transcript_name "AC007307.1-201"; protein_id "ENSMUSP00000100851";
    NT_166433 protein_coding start_codon 12026 12028 . + 0 gene_id "ENSMUSG00000000702"; transcript_id "ENSMUST00000105216"; exon_number "1"; gene_name "AC007307.1"; transcript_name "AC007307.1-201";
    NT_166433 protein_coding exon 16677 16841 . + . gene_id "ENSMUSG00000000702"; transcript_id "ENSMUST00000105216"; exon_number "2"; gene_name "AC007307.1"; transcript_name "AC007307.1-201";
    NT_166433 protein_coding CDS 16677 16841 . + 0 gene_id "ENSMUSG00000000702"; transcript_id "ENSMUST00000105216"; exon_number "2"; gene_name "AC007307.1"; transcript_name "AC007307.1-201"; protein_id "ENSMUSP00000100851";
    NT_166433 protein_coding exon 17745 17814 . + . gene_id "ENSMUSG00000000702"; transcript_id "ENSMUST00000105216"; exon_number "3"; gene_name "AC007307.1"; transcript_name "AC007307.1-201";

    At some point (around line 100) the thing changes to:

    18 protein_coding exon 3122455 3123465 . - . gene_id "ENSMUSG00000091539"; transcript_id "ENSMUST00000165255"; exon_number "1"; gene_name "AC125218.1"; transcript_name "AC125218.1-201";
    18 protein_coding CDS 3122495 3123412 . - 0 gene_id "ENSMUSG00000091539"; transcript_id "ENSMUST00000165255"; exon_number "1"; gene_name "AC125218.1"; transcript_name "AC125218.1-201"; protein_id "ENSMUSP00000129804";
    18 protein_coding start_codon 3123410 3123412 . - 0 gene_id "ENSMUSG00000091539"; transcript_id "ENSMUST00000165255"; exon_number "1"; gene_name "AC125218.1"; transcript_name "AC125218.1-201";
    18 protein_coding stop_codon 3122492 3122494 . - 0 gene_id "ENSMUSG00000091539"; transcript_id "ENSMUST00000165255"; exon_number "1"; gene_name "AC125218.1"; transcript_name "AC125218.1-201";
    18 protein_coding exon 3327492 3327589 . - . gene_id "ENSMUSG00000063889"; transcript_id "ENSMUST00000151311"; exon_number "1"; gene_name "Crem"; transcript_name "Crem-020";
    18 protein_coding CDS 3327492 3327535 . - 0 gene_id "ENSMUSG00000063889"; transcript_id "ENSMUST00000151311"; exon_number "1"; gene_name "Crem"; transcript_name "Crem-020"; protein_id "ENSMUSP00000118267";
    18 protein_coding start_codon 3327533 3327535 . - 0 gene_id "ENSMUSG00000063889"; transcript_id "ENSMUST00000151311"; exon_number "1"; gene_name "Crem"; transcript_name "Crem-020";
    18 protein_coding exon 3325359 3325476 . - . gene_id "ENSMUSG00000063889"; transcript_id "ENSMUST00000151311"; exon_number "2"; gene_name "Crem"; transcript_name "Crem-020";

    The file came from the UCSC Table browser.
    I guess that I should add the "chr" before the "18" in the above lines and probably delete the first 100lines? The first time I tried to use this file, TopHat didn't let me because it had some kind of duplicate entries. Is it possible that the first lines are problematic? Is there an easy way to add the "chr" in all the lines? I am really new to all this.
    Thanks again for the quick reply and excuse me for asking so obvious questions.
    Filippos

    Leave a comment:


  • jbrwn
    replied
    Originally posted by filippos View Post
    Hi everyone,
    I've been scanning this answer to my question but I could not find it. So I saw this post which kind of touches my problem. I downloaded one GTF file with the ENSEMBL annotation and the one you propose here. I used the same GTF in the Tophat, cufflinks and cuffcompare steps but the final output from cuffdiff does not contain any of the 2 annotations. I thought that I had to do another step to match the statistical analysis with the annotation, but I cannot find what that step is. As they are now, the data mean nothing unless I manually much the cufflinks names with the ENSEBL one.
    Could please somone explain what I am doing wrong?
    Thank you very much,
    Filippos
    you may want other people to verify anything i say, but this is what i think.

    make sure you add "chr" to column 1 of your ensemble reference. then use that reference to make your combined gtf in cuffcompare.
    Code:
    cuffcompare -r ensembl.gtf ensembl.gtf ensembl.gtf
    run cufflinks with resultant stdout.combined.gtf

    Leave a comment:


  • filippos
    replied
    Hi everyone,
    I've been scanning this answer to my question but I could not find it. So I saw this post which kind of touches my problem. I downloaded one GTF file with the ENSEMBL annotation and the one you propose here. I used the same GTF in the Tophat, cufflinks and cuffcompare steps but the final output from cuffdiff does not contain any of the 2 annotations. I thought that I had to do another step to match the statistical analysis with the annotation, but I cannot find what that step is. As they are now, the data mean nothing unless I manually much the cufflinks names with the ENSEBL one.
    Could please somone explain what I am doing wrong?
    Thank you very much,
    Filippos

    Leave a comment:


  • jbrwn
    replied
    Originally posted by genbio64 View Post
    @RockChalkJayhawk or ChrisL,
    Can one of you elaborate on that workflow?
    ucsc table browser, choose refseq genes for the track then refflat table.

    Leave a comment:


  • genbio64
    replied
    @RockChalkJayhawk or ChrisL,
    Can one of you elaborate on that workflow?

    Leave a comment:


  • ChrisL
    replied
    Brilliant! That worked.

    Thanks RockChalkJayhawk.

    Chris

    Leave a comment:


  • RockChalkJayhawk
    replied
    Originally posted by ChrisL View Post
    Yes, I used UCSC to generate a GTF file based on RefSeq, but the RefSeq annotation is really no better. For example, the human gene "MYOG" is "NC_000001.10" in RefSeq.

    If the GTF file had the gene id in a separate delimited column it would be easy to replace with the HGNC gene symbol using the UNIX join command and a lookup table. Luckily I have access to programmers as it looks like a job for a script.
    Did you use the refGene table of the refFlat table?

    Leave a comment:


  • ChrisL
    replied
    Yes, I used UCSC to generate a GTF file based on RefSeq, but the RefSeq annotation is really no better. For example, the human gene "MYOG" is "NC_000001.10" in RefSeq.

    If the GTF file had the gene id in a separate delimited column it would be easy to replace with the HGNC gene symbol using the UNIX join command and a lookup table. Luckily I have access to programmers as it looks like a job for a script.

    Leave a comment:


  • RockChalkJayhawk
    replied
    GTF File for Cufflinks

    Have you tried to download the RefSeq refFlat file in GTF format from the UCSC table browser? That might also work (and be a lot easier).

    Leave a comment:


  • Wei-HD
    replied
    I use MGI Batch Query to convert the ENSEMBLE ID to gene name:
    MGI: the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data for researching human health and disease.

    Leave a comment:


  • thinkRNA
    replied
    Originally posted by ChrisL View Post
    Yes, I have looked at Ensembl GTF files and they also lack the HGNC gene symbol attribute. Genes are identified by their Ensembl code; e.g.; ENSG00000122180.
    You will have to convert the ensembl ids to corresponding gene symbols. Check out biomart.

    you can select ensembl gene id and gene symbols and get the file which will help you translate. This will require some programming.

    Leave a comment:


  • ChrisL
    replied
    Yes, I have looked at Ensembl GTF files and they also lack the HGNC gene symbol attribute. Genes are identified by their Ensembl code; e.g.; ENSG00000122180.

    Leave a comment:


  • Wei-HD
    replied
    In the manual of Cufflinks:

    "Cuffcompare Input

    Cuffcompare takes Cufflinks' GTF output as input, and optionally can take a "reference" annotation (such as from Ensembl)"

    Just click the Ensemble, you will get the GTF file from each specie. Hope this helps.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Advanced Tools Transforming the Field of Cytogenomics
    by seqadmin


    At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
    Yesterday, 06:26 AM
  • seqadmin
    How RNA-Seq is Transforming Cancer Studies
    by seqadmin



    Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
    09-07-2023, 11:15 PM
  • seqadmin
    Methods for Investigating the Transcriptome
    by seqadmin




    Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

    Whole Transcriptome RNA-seq
    Whole transcriptome sequencing...
    08-31-2023, 11:07 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 06:57 AM
0 responses
6 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:53 AM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-25-2023, 07:42 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-22-2023, 09:05 AM
0 responses
44 views
0 likes
Last Post seqadmin  
Working...
X