Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Mark
    Member
    • Nov 2008
    • 54

    tophat .junc file

    Hi All

    I'm trying to use tophat with the --GFF argument so as to get RPKM data for some yeast experiments. My question is that the .junc file produced by tophat seems not to be consistent with the exon data supplied in the GFF file. For example, when the GFF specifies


    Scchr01 SGD gene 87287 87753 . + . ID=YAL030W

    Scchr01 SGD mRNA 87287 87753 . + . ID=YAL030WmRNA;Parent=YAL030W

    Scchr01 SGD exon 87287 87388 . + 0 ID=YAL030Wexon1;Parent=YAL030WmRNA

    Scchr01 SGD exon 87502 87753 . + 0 ID=YAL030Wexon2;Parent=YAL030WmRNA

    the .junc file specifies

    Scchr01 87387 87501 +

    The position 87387 appears incorrect if it is suppose to be indicating the first base of the intron (as 87501 appears to indicate the last position of the intron) or even the last base of the exon. Am I misinterpreting this or is there a problem here?

    Thanks for your help
  • henry
    Member
    • Sep 2009
    • 36

    #2
    Originally posted by Mark View Post
    Hi All

    I'm trying to use tophat with the --GFF argument so as to get RPKM data for some yeast experiments. My question is that the .junc file produced by tophat seems not to be consistent with the exon data supplied in the GFF file. For example, when the GFF specifies


    Scchr01 SGD gene 87287 87753 . + . ID=YAL030W

    Scchr01 SGD mRNA 87287 87753 . + . ID=YAL030WmRNA;Parent=YAL030W

    Scchr01 SGD exon 87287 87388 . + 0 ID=YAL030Wexon1;Parent=YAL030WmRNA

    Scchr01 SGD exon 87502 87753 . + 0 ID=YAL030Wexon2;Parent=YAL030WmRNA

    the .junc file specifies

    Scchr01 87387 87501 +

    The position 87387 appears incorrect if it is suppose to be indicating the first base of the intron (as 87501 appears to indicate the last position of the intron) or even the last base of the exon. Am I misinterpreting this or is there a problem here?

    Thanks for your help
    I have no idea. I 'm trying to install tophat. but there are errors occuring during installation. maybe I will also run into the same problem you have in the near future. i'm also expecting someone to fix it too. ^ ^

    Comment

    • sdriscoll
      I like code
      • Sep 2009
      • 436

      #3
      Don't know if you got this sorted out but from what I have seen in my runs with Tophat it isn't SUPER accurat when it comes to positions. Output tends to vary a little. What I see from your post is that the junction specified in your .junc file is a junction between those two exons (lines 3 and 4). I'm not surprised that Tophat has it a click or two off. I have sequencing from several lanes and when I compare the junction.bed files in UCSC's browser I can easily see that a junction found in one lane is the same as that found in another lane. However if I look at the numbers in the junction.bed files the start and end points of those junctions are not equal. They are sometimes up to 10 positions off from each other.
      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
      Salk Institute for Biological Studies, La Jolla, CA, USA */

      Comment

      • Cole Trapnell
        Senior Member
        • Nov 2008
        • 213

        #4
        A splice junction identified in two different runs may look slightly different in the bed file. The reason for this is not due to alignment accuracy, it's actually a feature of the output format.

        Each bed record in junctions.bed contains two blocks, one on the left side of the intron and one on the right side. The length of these blocks is determined by looking at all the alignments that span the junction, and measuring how far the left and right "overhangs" extend for each read. That is, suppose a read that spans a junction in such a way that the first 20 bp of the read fall on the left exon, and the last 55bp fall on the right exon (for a 75bp) read. If there is only one alignment spanning this intron, then the bed record for it will have the first block be 20bp, and the second block 55bp, and the distance between them in the genomic coordinate space will be the length of the intron.

        If there are multiple alignments across the junction, then each block is as big as the biggest overhang from any read, on each side. Does this make sense?

        Thus since the number of reads spanning a given junction will naturally vary from run to run, as will how they fall across it, the length of the blocks will vary. However, the actual intron coordinates reflected by a given bed record should be consistent from run to run, at least as long as there are any alignments at all spanning that intron.

        It's straightforward to extract the actual intron coordinates from the bed records after a run, and in the upcoming version of TopHat (1.0.11), I provide a script to do so.
        Last edited by Cole Trapnell; 09-23-2009, 08:32 PM.

        Comment

        • Cole Trapnell
          Senior Member
          • Nov 2008
          • 213

          #5
          I should have posted a reply to Mark's earlier question as well. The .juncs file format is zero-based (as opposed to the 1-based GTF file), and left coordinate marks the rightmost base of the *left* exon. The right coordinate in each line marks the leftmost base of the *right* exon. Think of it as "each line says concatenate right base to the left base, leaving out everything in between".

          Comment

          • sdriscoll
            I like code
            • Sep 2009
            • 436

            #6
            Thanks Cole. Your responses are very helpful in understanding the outputs. I'm actually a programmer working for a lab and they have charged me with learning how to use Tophat and Bowtie. From what you wrote here it sounds like if I were to compare intron coordinates between two runs in the .bed files I should be able to filter out matching junctions and reveal junctions from one run that did not show up in another.
            /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
            Salk Institute for Biological Studies, La Jolla, CA, USA */

            Comment

            Latest Articles

            Collapse

            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM
            • seqadmin
              Investigating the Gut Microbiome Through Diet and Spatial Biology
              by seqadmin




              The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
              02-24-2025, 06:31 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 05:03 AM
            0 responses
            16 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-19-2025, 07:27 AM
            0 responses
            17 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            18 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            185 views
            0 reactions
            Last Post seqadmin  
            Working...