Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

Genomic coordinates to gene names

  • Filter
  • Time
  • Show
Clear All
new posts

  • Genomic coordinates to gene names

    Hi All,

    Trying to get a list of gene names (preferably HUGO names) for 90,000 genomic co-ordinates (BED file). Very confused with Biomarts API. Ensembl's interface is taking hours. Spent hours on UCSC and cant see any option to retrieve this information. Any help on any other method to achieve this appreciated


  • #2
    Not sure if they are HUGO names, but seems like the refFlat table in the Table Browser will get you there.

    Click the "define regions", paste in your BED file, and get output.

    edit:...looks like it's limited to 1k entries....


    • #3
      You should be able to get the entire refFlat file from UCSC's table browser. That file will include the RefSeq IDs, start and end positions of the gene, and the gene name.

      I think their gene name is the HUGO name.
      Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
      Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
      Projects: U87MG whole genome sequence [Website] [Paper]


      • #4
        BED Tools for comparing genomic intervals

        I recently completed a new suite of BED Tools for addressing such questions.

        They are available for 64-bit LINUX and Intel Macs at:

        Specifically, in the case of your question, you would download RefSeq (not sure if they are HUGO names) from the UCSC Table browser.

        Then run intersectBed -a <yourfile> -b refSeqFromUCSC.bed -wb

        The -wb option will write the entire RefSeq entry so that you can track the name associated with each overlap.

        If you have further question, just shout. Nicely.


        • #5
          Thankx guys, the reflat file is useful, which I was not aware of.

          Thanx ECO, but yes its limited to 1000 co-ordinates. Not the best way for 90,000 coordinates

          Quinlana, I downloaded BED tools and ran from the bin folder, but I got an error message
          ./intersectBed -a mygenomiccoordinates.bed -b genome_ucsc.bed -wb
          bash: ./intersectBed: Bad CPU type in executable



          • #6
            OS Type?

            Hi Layla,
            Apologies for that. What OS and processor are you using? The Linux version should work on 64-bit Red Hat and Ubuntu. Regardless, I'll post the source later today so you can compile the programs on your system. Sorry for the trouble, I just finished testing all of these tools yesterday and they work on all of our systems. However, I haven't been diligent about trying them out for every Linux flavor.



            • #7
              Hi Aaron,

              No worries, Thankyou for the help!

              My machine is a Mac OS X Version: 10.4.11
              Processor: 2.4GHz intel core 2 duo



              • #8
                Gotcha. I believe the Core Duo processors are 32-bit. Email me at aaronquinlan [at] gmail and I'll send you a pre-compiled version for your machine.


                • #9

                  I am still having problems with using the refFlat file and bed tools. I downloaded the refFlat.txt file for hg18. First, this file is not in the BED format. Is there a command line tool which just lets me add the gene symbol to my input file, which is in the format of "chr","start","end", so BED format. If this question is redundant, please excuse me, and point me to the right page so I can follow some instructions step wise and annotate my BED file, with gene symbols.