Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cuffcompare output

    Hi there

    I am running Cufflinks 'cuffcompare' on transcript.gtf (produced by cufflinks) and comparing it with a .gtf file downloaded from Ensemble

    as such:

    ./cuffcompare -r /home/Homo_sapiens.GRCh37.55.gtf transcripts.gtf

    and I seem to get no matches at all between files?!

    #= Summary for dataset: transcripts.gtf :
    # Total mRNAs : 17735 in 17537 loci (17696 multi-exon)
    # Reference mRNAs : 99330 in 43502 loci (82822 multi-exon)
    # Corresponding super-loci: 0
    #--------------------| Sn | Sp | fSn | fSp
    Base level: 0.0 0.0 - -
    Exon level: 0.0 0.0 0.0 0.0
    Intron level: 0.0 0.0 0.0 0.0
    Intron chain level: 0.0 0.0 0.0 0.0
    Transcript level: 0.0 0.0 0.0 0.0
    Locus level: 0.0 0.0 0.0 0.0
    Missed exons: 353318/353318 (100.0%)
    Wrong exons: 45831/45831 (100.0%)
    Missed introns: 272474/272474 (100.0%)
    Wrong introns: 28264/28264 (100.0%)
    Missed loci: 0/43502 ( 0.0%)
    Wrong loci: 0/17537 ( 0.0%)


    HAs anyone else tried this - where did you get your reference .gtf from? Ive used this previously when TopHat itself calculated RPKM values, and it worked fine

    Thanks

  • #2
    Originally posted by nat View Post
    Hi there

    I am running Cufflinks 'cuffcompare' on transcript.gtf (produced by cufflinks) and comparing it with a .gtf file downloaded from Ensemble

    as such:

    ./cuffcompare -r /home/Homo_sapiens.GRCh37.55.gtf transcripts.gtf

    and I seem to get no matches at all between files?!

    #= Summary for dataset: transcripts.gtf :
    # Total mRNAs : 17735 in 17537 loci (17696 multi-exon)
    # Reference mRNAs : 99330 in 43502 loci (82822 multi-exon)
    # Corresponding super-loci: 0
    #--------------------| Sn | Sp | fSn | fSp
    Base level: 0.0 0.0 - -
    Exon level: 0.0 0.0 0.0 0.0
    Intron level: 0.0 0.0 0.0 0.0
    Intron chain level: 0.0 0.0 0.0 0.0
    Transcript level: 0.0 0.0 0.0 0.0
    Locus level: 0.0 0.0 0.0 0.0
    Missed exons: 353318/353318 (100.0%)
    Wrong exons: 45831/45831 (100.0%)
    Missed introns: 272474/272474 (100.0%)
    Wrong introns: 28264/28264 (100.0%)
    Missed loci: 0/43502 ( 0.0%)
    Wrong loci: 0/17537 ( 0.0%)


    HAs anyone else tried this - where did you get your reference .gtf from? Ive used this previously when TopHat itself calculated RPKM values, and it worked fine

    Thanks
    I think the names of chromosomes are not matched.
    Xi Wang

    Comment


    • #3
      As pointed out, the standard Ensembl GTF file contains chromosome numbers (1,2,3,..) instead of chromosome identifiers (chr1,chr2,chr3,...) so you need to convert these. You also need to convert the chromosome coordinates in the Ensembl GTF from the first base being 1 to first base being 0 (simply subtract 1 from the start coordinate).

      After these two conversion steps you should be able to use the Cufflinks suite of programs.

      EDIT
      You do not need to edit the coordinates in the GTF file.
      Last edited by Thomas Doktor; 12-06-2010, 11:25 AM. Reason: Users should not edit the feature coordinates in GTF files

      Comment


      • #4
        Originally posted by Thomas Doktor View Post
        You also need to convert the chromosome coordinates in the Ensembl GTF from the first base being 1 to first base being 0 (simply subtract 1 from the start coordinate).
        hi, Tom

        I think Ensembl GTF is fine itself without subtracting 1 from the start coordinate, since GTF start is 1-based.

        Comment


        • #5
          Hi,

          Just to confirm, Ensembl does use 1-based coordinates for the genome (in gtf and other files).

          Comment


          • #6
            Originally posted by Giulietta EnsemblHelpdesk View Post
            Hi,

            Just to confirm, Ensembl does use 1-based coordinates for the genome (in gtf and other files).
            Yes, I agree. I was just correcting Tom's comment of subtracting 1 from Ensembl's GTF. It's not needed.

            Comment


            • #7
              You're right, I was mixing it up with BED format. Apologies.

              Comment


              • #8
                You may also need to change MT to chrM for mito genes.

                Comment


                • #9
                  Cuffcompare output

                  Hi Folks, I need your help on this isssue. I ran cuffcompare and I got the following output below:
                  #= Summary for dataset: cufflinks_6/transcripts.gtf :
                  # Query mRNAs : 41274 in 33119 loci (24486 multi-exon transcripts)
                  # (6071 multi-transcript loci, ~1.2 transcripts per locus)
                  # Reference mRNAs : 26679 in 24525 loci (20258 multi-exon)
                  # Corresponding super-loci: 14030
                  #--------------------| Sn | Sp | fSn | fSp
                  Base level: 66.3 49.7 - -
                  Exon level: 48.9 60.9 51.2 63.8
                  Intron level: 63.4 88.3 63.7 88.8
                  Intron chain level: 31.1 25.7 39.1 32.3
                  Transcript level: 0.0 0.0 0.4 0.2
                  Locus level: 25.4 18.8 29.7 21.9

                  Matching intron chains: 6303
                  Matching loci: 6236

                  Missed exons: 69277/207203 ( 33.4%)
                  Novel exons: 24838/166289 ( 14.9%)
                  Missed introns: 59718/181523 ( 32.9%)
                  Novel introns: 6896/130363 ( 5.3%)
                  Missed loci: 9351/24525 ( 38.1%)
                  Novel loci: 14555/33119 ( 43.9%)

                  Total union super-loci across all input datasets: 35358
                  (12030 multi-transcript, ~3.8 transcripts per locus)

                  Can someone help me with the intepretation of this result? I searched through the manual but got no clue.
                  THanks.

                  Comment


                  • #10
                    Can someone help me out with this cuffcompare output.
                    I'm imploring someone to please help or provide a link that will give me more information.

                    Comment


                    • #11
                      and in addition to the above, can someone help me with the cuffcompare output posted above to answer this questions:

                      Determining the total number of transcripts (Known, partial and novel) assembled that are compatible with the existing annotation.
                      How to determine the total of unannotated spliced isoforms of known genes.

                      How to determine the number of transcripts found in the intergenic regions at certain distances like 1,000bp from known genes.

                      Thanks.

                      Comment


                      • #12
                        retreiving sequences

                        i did cuffcompare and the files "cuffcmp.combined.gtf" "cuffcmp.loci" "cuffcmp.stats" "cuffcmp.tracking". i am interested in unknown transcripts. so i did a grep on .tracking file. the count came to 3940.
                        now, how do i get the sequence of those 3940 transcripts?

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Non-Coding RNA Research and Technologies
                          by seqadmin




                          Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                          Nobel Prize for MicroRNA Discovery
                          This week,...
                          Yesterday, 08:07 AM
                        • seqadmin
                          Recent Developments in Metagenomics
                          by seqadmin





                          Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                          09-23-2024, 06:35 AM
                        • seqadmin
                          Understanding Genetic Influence on Infectious Disease
                          by seqadmin




                          During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                          Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                          09-09-2024, 10:59 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 10-02-2024, 04:51 AM
                        0 responses
                        95 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 10-01-2024, 07:10 AM
                        0 responses
                        105 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 09-30-2024, 08:33 AM
                        1 response
                        104 views
                        0 likes
                        Last Post EmiTom
                        by EmiTom
                         
                        Started by seqadmin, 09-26-2024, 12:57 PM
                        0 responses
                        20 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X