Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • upendra_35
    Senior Member
    • Apr 2010
    • 102

    tophat junctions.bed file

    I recently found that my junctions.bed file contained names that are not found in gff reference. How does it happen?

    PHP Code:
    upendra_35@vm142-14 tophat_out_3_7_8_lanes]$ head junctions.bed 
    track name
    =junctions description="TopHat junctions"
    Scaffold006725    29    277    JUNC00000001    34    +    29    277    255,0,0    2    90,90    0,158
    Scaffold006725    31    254    JUNC00000002    2    
    +    31    254    255,0,0    2    88,55    0,168
    Scaffold007604    1    292    JUNC00000003    27    
    -    1    292    255,0,0    2    79,66    0,225
    Scaffold007614    50    255    JUNC00000004    54    
    +    50    255    255,0,0    2    90,35    0,170
    Scaffold006711    38    322    JUNC00000005    39    
    -    38    322    255,0,0    2    89,82    0,202
    Scaffold007629    81    293    JUNC00000006    8    
    -    81    293    255,0,0    2    70,56    0,156
    Scaffold006763    96    316    JUNC00000007    7    
    -    96    316    255,0,0    2    90,52    0,168
    Scaffold007639    84    292    JUNC00000008    7    
    -    84    292    255,0,0    2    82,56    0,152
    Scaffold007736    14    230    JUNC00000009    6    
    -    14    230    255,0,0    2    44,86    0,130 
    PHP Code:
    [upendra_35@vm142-14 tophat_out_3_7_8_lanes]$ tail /mydata/B.rapa_gene_model_0830.gff 
    Scaffold004047    glean    CDS    11    33    
    .    +    0    Parent=Bra041170;
    Scaffold004047    glean    CDS    123    321    .    +    2    Parent=Bra041170;
    Scaffold004813    glean    mRNA    190    414    0.998901    -    .    ID=Bra041171;
    Scaffold004813    glean    CDS    190    414    .    -    0    Parent=Bra041171;
    Scaffold004894    glean    mRNA    3    410    1    +    .    ID=Bra041172;
    Scaffold004894    glean    CDS    3    410    .    +    0    Parent=Bra041172;
    Scaffold005112    blat    mRNA    131    295    1.0000    +    .    ID=Bra041173;
    Scaffold005112    blat    CDS    131    295    100    +    .    Parent=Bra041173;
    Scaffold008211    glean    mRNA    18    251    0.970334    +    .    ID=Bra041174;
    Scaffold008211    glean    CDS    18    251    .    +    0    Parent=Bra041174
  • Hobbe
    Member
    • Apr 2010
    • 29

    #2
    The names are taken from your genome fasta file, not the reference gff file. This is of course logical, since the junctions are results from your mapping of reads to the genome. Seems tophat finds junctions on scaffolds that have no information in your reference gff.

    Or did I not understand your question?

    Comment

    • upendra_35
      Senior Member
      • Apr 2010
      • 102

      #3
      Originally posted by Hobbe View Post
      The names are taken from your genome fasta file, not the reference gff file. This is of course logical, since the junctions are results from your mapping of reads to the genome. Seems tophat finds junctions on scaffolds that have no information in your reference gff.

      Or did I not understand your question?
      Thanks Hobbe for the response. I just checked my fasta file and i could find the names in there. It does mean now that my gff is not complete. Do you know is there a way to get a complete gff (probably based on RNAseq data?). I got this from the Brassica genome annotation guys.

      I have one other related question regarding junctions.bed file. Can i use this file to tell if a gene is fused or not compared to gff (assuming the gff is complete).

      After looking at the tophat bam file and transcript.gtf along with gff (reference) file on IGV i found that some of the annotated genes are fused and some are not fused (i.e a single gene in transcript.gtf is reported as two genes in reference gff and sometimes a fused gene (2 genes) in transcript.gtf is reported as single gene in reference gff). All i want to know is how many of these discrepencies exist in reference annotation (gff) compared to cufflink transcripts.

      Any ideas

      Comment

      Latest Articles

      Collapse

      • GATTACAT
        Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by GATTACAT
        Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
        Today, 11:43 AM
      • SEQadmin2
        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by SEQadmin2


        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

        Here are nine questions we think about, in roughly the order they matter, before...
        06-18-2026, 07:11 AM
      • SEQadmin2
        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
        by SEQadmin2


        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
        ...
        06-02-2026, 10:05 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, Yesterday, 05:37 AM
      0 responses
      9 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-26-2026, 11:10 AM
      0 responses
      18 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-17-2026, 06:09 AM
      0 responses
      52 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      110 views
      0 reactions
      Last Post SEQadmin2  
      Working...