Hi all,
Recently, I ran TopHat with 76bp reads data and got the results (sam, bed, and wig files).
Actual a few lines of my input (fasta file) are:
>HWUSI-EAS366:4:1:4:624#0/1:
CTCNGGATGGAGTACAGTGGTGTGATCATGGCTCACTGTAGNNNNNANCNCNTGGGCGCAAGCNNNNNNNNNCTAN
>HWUSI-EAS366:4:1:4:243#0/1:
CGGNGCCGTTGCTGGTTCTCACACCTTTTAGGTCTGTTCTCNNNNNCNGNTNCGACTCTCTCTNNNNNANNNCCGN
>HWUSI-EAS366:4:1:4:1373#0/1:
GAAAAAACCACCCAGCGGTGATGGCAGCGCGCGTGGGTCCCNNNGNGNGNGGGGCGGGTCGCGCNNNNGNNNCGAN
>HWUSI-EAS366:4:1:4:1672#0/1:
GGGCAGGAAAAAAAGGGAAGANAAAATACTGGGGAAGAAAANNNANCNCNGTTTGGCAGCTCTTNNNNGNNNCAGN
And a few lines of junctions.bed file are:
track name=junctions description="TopHat junctions"
gi|29823169|ref|NT_025004.13|Hs18_25160 9690 19656 JUNC00000001 1 + 9690 19656 255,0,0 2 37,38 0,9928
gi|29823169|ref|NT_025004.13|Hs18_25160 14260 19654 JUNC00000002 2 + 14260 19654 255,0,0 2 57,36 0,5358
gi|29823169|ref|NT_025004.13|Hs18_25160 19701 160104 JUNC00000003 3 + 19701 160104 255,0,0 2 32,66 0,140337
A few lines of coverage.wig file are:
track type=bedGraph name="TopHat - read coverage"
gi|29823169|ref|NT_025004.13|Hs18_25160 0 9580 0
gi|29823169|ref|NT_025004.13|Hs18_25160 9580 9655 1
gi|29823169|ref|NT_025004.13|Hs18_25160 9655 9690 0
Here is the problem.
When I copied and pasted the results (either bed file or wig file), I always got an error and when I change the gi|29823169|ref|NT... part to something like chromosome name, it works.
As you can see from my input file, I don't have gi|29823169|ref|NT... part. I am not sure where the TopHat find such label or reference.
Can someone tell me what gi|29823169|ref|NT... part means and how I can convert these files into the one that UCSC genome brower understands. I think I need to get the actual chromosome names.
Thank you,
Statsteam
Recently, I ran TopHat with 76bp reads data and got the results (sam, bed, and wig files).
Actual a few lines of my input (fasta file) are:
>HWUSI-EAS366:4:1:4:624#0/1:
CTCNGGATGGAGTACAGTGGTGTGATCATGGCTCACTGTAGNNNNNANCNCNTGGGCGCAAGCNNNNNNNNNCTAN
>HWUSI-EAS366:4:1:4:243#0/1:
CGGNGCCGTTGCTGGTTCTCACACCTTTTAGGTCTGTTCTCNNNNNCNGNTNCGACTCTCTCTNNNNNANNNCCGN
>HWUSI-EAS366:4:1:4:1373#0/1:
GAAAAAACCACCCAGCGGTGATGGCAGCGCGCGTGGGTCCCNNNGNGNGNGGGGCGGGTCGCGCNNNNGNNNCGAN
>HWUSI-EAS366:4:1:4:1672#0/1:
GGGCAGGAAAAAAAGGGAAGANAAAATACTGGGGAAGAAAANNNANCNCNGTTTGGCAGCTCTTNNNNGNNNCAGN
And a few lines of junctions.bed file are:
track name=junctions description="TopHat junctions"
gi|29823169|ref|NT_025004.13|Hs18_25160 9690 19656 JUNC00000001 1 + 9690 19656 255,0,0 2 37,38 0,9928
gi|29823169|ref|NT_025004.13|Hs18_25160 14260 19654 JUNC00000002 2 + 14260 19654 255,0,0 2 57,36 0,5358
gi|29823169|ref|NT_025004.13|Hs18_25160 19701 160104 JUNC00000003 3 + 19701 160104 255,0,0 2 32,66 0,140337
A few lines of coverage.wig file are:
track type=bedGraph name="TopHat - read coverage"
gi|29823169|ref|NT_025004.13|Hs18_25160 0 9580 0
gi|29823169|ref|NT_025004.13|Hs18_25160 9580 9655 1
gi|29823169|ref|NT_025004.13|Hs18_25160 9655 9690 0
Here is the problem.
When I copied and pasted the results (either bed file or wig file), I always got an error and when I change the gi|29823169|ref|NT... part to something like chromosome name, it works.
As you can see from my input file, I don't have gi|29823169|ref|NT... part. I am not sure where the TopHat find such label or reference.
Can someone tell me what gi|29823169|ref|NT... part means and how I can convert these files into the one that UCSC genome brower understands. I think I need to get the actual chromosome names.
Thank you,
Statsteam
Comment