For the Tophat output file "junctions.bed", every line represents a single junction when mapping short sequencing reads to the genome. Each junction consists of two connected BED blocks, where each block is as long as the maximal overhang of any read spanning the junction (from TopHat manual).
For the following line,
chr20 257975 259040 JUNC00000002 6 - 257975 259040 255,0,0 2 49,70 0,995
it means 6 reads are mapped to this junction site. The limits of the two blockes, 257975 to 258024 (Block1) and 258970 to 259040 (Block2), were defined by any of the reads that spanning farthest at this junction site. Here is the problem. I want to know the start and end position for each of the 6 reads. So, the one line above may need to be expanded to 6 lines, with each line specifying the mapping position for each read.
The present Tophat can not generate the results I need. It seems that tophat produces some temporary files when running. Is there any way I can generate the results I need from these temporary file? Does anybody has similar experience, or knows any other program that can give the results I want?
Any suggestions will be greatly appreciated!
For the following line,
chr20 257975 259040 JUNC00000002 6 - 257975 259040 255,0,0 2 49,70 0,995
it means 6 reads are mapped to this junction site. The limits of the two blockes, 257975 to 258024 (Block1) and 258970 to 259040 (Block2), were defined by any of the reads that spanning farthest at this junction site. Here is the problem. I want to know the start and end position for each of the 6 reads. So, the one line above may need to be expanded to 6 lines, with each line specifying the mapping position for each read.
The present Tophat can not generate the results I need. It seems that tophat produces some temporary files when running. Is there any way I can generate the results I need from these temporary file? Does anybody has similar experience, or knows any other program that can give the results I want?
Any suggestions will be greatly appreciated!
Comment