Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • memeri
    Junior Member
    • May 2011
    • 9

    refSeq gene tables -- all in one file?

    Does NCBI have a single, downloadable file containing all refSeq-annotated exons for all genes? This information is available on a browser, one gene at a time, for example:

    I'd like to maintain a local database with this information.
  • mastal
    Senior Member
    • Mar 2009
    • 666

    #2
    See the ftp downloads site:

    ftp://ftp.ncbi.nlm.nih.gov/refseq/release/

    Comment

    • memeri
      Junior Member
      • May 2011
      • 9

      #3
      Thank you, mastal.
      I was hoping for something a bit higher level than this, though. If I want to compile an up-to-date list of all annotated exons mapped to the current human reference genome assembly, I need to search through all 468 '...genomic.gbff' files in the 'vertebrate_mammalian' directory. In a somewhat random search of about 20 of these files, I didn't come across one with an entry from the current build (GRCh38.p2), and I don't see an index that may help me limit my search. I suspect that there may be a more direct route to this information; I'm sure NCBI does not search through all this every time they serve up a gene table.

      Comment

      • memeri
        Junior Member
        • May 2011
        • 9

        #4
        Thanks, blancha. These are, of course, browser based methods -- one gene at a time, butnice to know!

        Comment

        • blancha
          Senior Member
          • May 2013
          • 367

          #5
          Sorry, @memeri, I already deleted my last post, since I wasn't sure it answered your question. I'm putting the information back, since it probably does answer your question.

          These are not one gene at a time methods.
          You will get the entire annotation for the whole genome in one file.
          You will have the location of each exon for all the genes.

          The refSeq annotation of the exons can easily be downloaded from the UCSC Table Browser, using the RefSeq Track.
          You can select the fields you want, if you don't want all the fields, as well as the format of the output file.

          An alternative is to use Ensembl. You can just download a GTF file, or use biomaRt.

          Here are the fields available in the refSeq track.

          Code:
          name
          chrom
          strand
          txStart
          txEnd
          cdsStart
          cdsEnd
          exonCount
          exonStarts
          exonEnds
          score
          name2
          cdsStartStat
          cdsEndStat
          exonFrames
          You may have to work a bit to get exactly the format you want, but all the information is available from the UCSC Table Browser, Ensembl, and probable GenCode.

          I think that answers your question.
          If it doesn't, I'll let someone else try.
          And, I won't delete the post this time.

          Comment

          • memeri
            Junior Member
            • May 2011
            • 9

            #6
            So, I've found the answer, for those who may be interested. The file 'GCF_000001405.28_knownrefseq_alignments.gff3' (or the most recent version) in the directory 'ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/alignments/' maps every refSeq exon to the current human genome build. Plus you need the file 'gene2accession.gz' in 'ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/' to map the accession numbers to the geneIDs.

            Thanks again.

            Comment

            • memeri
              Junior Member
              • May 2011
              • 9

              #7
              Thanks for your second note, blancha. I didn't see it until after I posted my most recent note. I'll look at it more closely to see if it's better than what I came up with. It looks like it returns less irrelevant information in a single download, which is better.
              Mark

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Pathogen Surveillance with Advanced Genomic Tools
                by seqadmin




                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                03-24-2025, 11:48 AM
              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-20-2025, 05:03 AM
              0 responses
              42 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-19-2025, 07:27 AM
              0 responses
              51 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-18-2025, 12:50 PM
              0 responses
              38 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              193 views
              0 reactions
              Last Post seqadmin  
              Working...