Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using tophat results via UCSC genome browser

    Hi all,

    Recently, I ran TopHat with 76bp reads data and got the results (sam, bed, and wig files).

    Actual a few lines of my input (fasta file) are:
    >HWUSI-EAS366:4:1:4:624#0/1:
    CTCNGGATGGAGTACAGTGGTGTGATCATGGCTCACTGTAGNNNNNANCNCNTGGGCGCAAGCNNNNNNNNNCTAN
    >HWUSI-EAS366:4:1:4:243#0/1:
    CGGNGCCGTTGCTGGTTCTCACACCTTTTAGGTCTGTTCTCNNNNNCNGNTNCGACTCTCTCTNNNNNANNNCCGN
    >HWUSI-EAS366:4:1:4:1373#0/1:
    GAAAAAACCACCCAGCGGTGATGGCAGCGCGCGTGGGTCCCNNNGNGNGNGGGGCGGGTCGCGCNNNNGNNNCGAN
    >HWUSI-EAS366:4:1:4:1672#0/1:
    GGGCAGGAAAAAAAGGGAAGANAAAATACTGGGGAAGAAAANNNANCNCNGTTTGGCAGCTCTTNNNNGNNNCAGN


    And a few lines of junctions.bed file are:

    track name=junctions description="TopHat junctions"
    gi|29823169|ref|NT_025004.13|Hs18_25160 9690 19656 JUNC00000001 1 + 9690 19656 255,0,0 2 37,38 0,9928
    gi|29823169|ref|NT_025004.13|Hs18_25160 14260 19654 JUNC00000002 2 + 14260 19654 255,0,0 2 57,36 0,5358
    gi|29823169|ref|NT_025004.13|Hs18_25160 19701 160104 JUNC00000003 3 + 19701 160104 255,0,0 2 32,66 0,140337


    A few lines of coverage.wig file are:

    track type=bedGraph name="TopHat - read coverage"
    gi|29823169|ref|NT_025004.13|Hs18_25160 0 9580 0
    gi|29823169|ref|NT_025004.13|Hs18_25160 9580 9655 1
    gi|29823169|ref|NT_025004.13|Hs18_25160 9655 9690 0


    Here is the problem.

    When I copied and pasted the results (either bed file or wig file), I always got an error and when I change the gi|29823169|ref|NT... part to something like chromosome name, it works.

    As you can see from my input file, I don't have gi|29823169|ref|NT... part. I am not sure where the TopHat find such label or reference.

    Can someone tell me what gi|29823169|ref|NT... part means and how I can convert these files into the one that UCSC genome brower understands. I think I need to get the actual chromosome names.

    Thank you,
    Statsteam

  • #2
    You might give Galaxy a try: http://main.g2.bx.psu.edu/

    On there you can upload the file and manipulate it into a format that you can use. I'm pretty new to bioinformatics so there might be a faster and easier way, but thus far it has been a very useful tool. Since you are wanting to upload the data to UCSC it would probably be best to place your data into a GFF file. I can give you a little walk through on using Galaxy to convert your BED file into a GFF.

    1. Go to http://main.g2.bx.psu.edu/
    2. Click 'Get Data' and then 'Upload File' select your file as a BED file and then browse for your file, select it, and then click Execute.
    3. Your data should be in separate columns automatically by selecting the file type as BED which should convert the pipes into tabs. From here it is just simple text manipulation.
    4. Click on 'Text Manipulation' and then you can manipulate your file by adding the necessary columns for the GFF format and then you can just simply cut them using the 'Cut' option to put them in order.

    Hope this helps.


    -Brandon

    Comment


    • #3
      Alternatively, you could download hg18/19 indexes from the Bowtie website and use them from now on (what you see now is a result of using the indexes built by NCBI assemblies), which use chromosome names consistent with the UCSC genome browser (chr1, chr2 etc), if you don't want to edit/run things yourself.

      Cheers,

      -- Leo

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Choosing Between NGS and qPCR
        by seqadmin



        Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
        10-18-2024, 07:11 AM
      • seqadmin
        Non-Coding RNA Research and Technologies
        by seqadmin




        Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

        Nobel Prize for MicroRNA Discovery
        This week,...
        10-07-2024, 08:07 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 11-01-2024, 06:09 AM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 10-30-2024, 05:31 AM
      0 responses
      14 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 10-24-2024, 06:58 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 10-23-2024, 08:43 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X