Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • shirley0818
    Member
    • Apr 2013
    • 13

    how to combine junction.bed files with different rows into one table

    Hi All,

    I am working on ~100 samples for detecting alternative splicing. TopHat generates a junction.bed file for each sample. However, each of these bed files has different number of rows, and the coordinates of each junction is not same across samples. I think this junction.bed file includes known and novel junctions.

    Since I am only interested in known junctions in Ensembl annotation database, how can I map these 100 junction.bed files to Ensembl gtf file and obtain a table matrix with the row as known-junction and column as sampleID?

    Or do I need to create a exon-exon junction annotation bed file from Ensembl, then apply RSeQC to obtain reads for each junction against mapped .bam files?

    Many thanks,

    Shirley

    Shirley
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    If I understand your aim correctly between the following two threads you should be able to get pointers for what you need.

    gtftobed: https://www.biostars.org/p/56280/
    bedops: https://www.biostars.org/p/119835/

    Comment

    • shirley0818
      Member
      • Apr 2013
      • 13

      #3
      Thanks GenoMax for your quick response. I have tried "bedtools" as you suggested, but got very strange results: Below are the example files and output.

      1. Here is s1.junction.bed file generated by TopHat
      Column #5 "score" is the number of reads that contain the junction.
      1 12197 12639 JUNC00000001 1 + 12197 12639 255,0,0 2 30,45 0,397
      1 12190 12686 JUNC00000002 7 + 12190 12686 255,0,0 2 37,74 0,422
      1 12633 13292 JUNC00000003 6 + 12633 13292 255,0,0 2 64,72 0,587

      2. Here is hg19_RefSeq.mod.sorted.bed
      1 11873 12047 NR_046018 0 + 14409 14409 0 3 354,109,1189, 0,739,1347,
      1 12210 12684 NR_046018 0 + 14409 14409 0 3 354,109,1189, 0,739,1347,
      1 12644 13464 NR_046018 0 + 14409 14409 0 3 354,109,1189, 0,739,1347,
      1 12627 13527 NR_046018 0 + 14409 14409 0 3 354,109,1189, 0,739,1347,
      1 12661 13465 NR_046018 0 + 14409 14409 0 3 354,109,1189, 0,739,1347,
      1 12663 13469 NR_046018 0 + 14409 14409 0 3 354,109,1189, 0,739,1347,
      1 12657 13539 NR_046018 0 + 14409 14409 0 3 354,109,1189, 0,739,1347,

      3. run bedtools
      bedtools intersect -a hg19_RefSeq.sorted.bed -b s1.junctions.bed > out.txt

      The results in out.txt is very strange since column #5 "score" (read counts) is all 0.

      1 11873 12047 NR_046018 0 + 14409 14409 0 3 354,109,1189, 0,739,1347,
      1 12210 12684 NR_046018 0 + 14409 14409 0 3 354,109,1189, 0,739,1347,
      1 12644 13464 NR_046018 0 + 14409 14409 0 3 354,109,1189, 0,739,1347,
      1 12627 13527 NR_046018 0 + 14409 14409 0 3 354,109,1189, 0,739,1347,

      Could you let me know whether I have misused "bedtools intersect"?

      Many thanks,

      Comment

      • shirley0818
        Member
        • Apr 2013
        • 13

        #4
        Hi GenoMax,

        I have figured it out by adding -wa -wb option:
        bedtools intersect -wa -wb -a A.bed -b B.bed

        Many thanks,
        Shirley

        Comment

        Latest Articles

        Collapse

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Yesterday, 10:09 AM
        0 responses
        10 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        20 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 12:03 PM
        0 responses
        27 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 11:40 AM
        0 responses
        21 views
        0 reactions
        Last Post SEQadmin2  
        Working...