Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • memeri
    replied
    Thanks for your second note, blancha. I didn't see it until after I posted my most recent note. I'll look at it more closely to see if it's better than what I came up with. It looks like it returns less irrelevant information in a single download, which is better.
    Mark

    Leave a comment:


  • memeri
    replied
    So, I've found the answer, for those who may be interested. The file 'GCF_000001405.28_knownrefseq_alignments.gff3' (or the most recent version) in the directory 'ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/alignments/' maps every refSeq exon to the current human genome build. Plus you need the file 'gene2accession.gz' in 'ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/' to map the accession numbers to the geneIDs.

    Thanks again.

    Leave a comment:


  • blancha
    replied
    Sorry, @memeri, I already deleted my last post, since I wasn't sure it answered your question. I'm putting the information back, since it probably does answer your question.

    These are not one gene at a time methods.
    You will get the entire annotation for the whole genome in one file.
    You will have the location of each exon for all the genes.

    The refSeq annotation of the exons can easily be downloaded from the UCSC Table Browser, using the RefSeq Track.
    You can select the fields you want, if you don't want all the fields, as well as the format of the output file.

    An alternative is to use Ensembl. You can just download a GTF file, or use biomaRt.

    Here are the fields available in the refSeq track.

    Code:
    name
    chrom
    strand
    txStart
    txEnd
    cdsStart
    cdsEnd
    exonCount
    exonStarts
    exonEnds
    score
    name2
    cdsStartStat
    cdsEndStat
    exonFrames
    You may have to work a bit to get exactly the format you want, but all the information is available from the UCSC Table Browser, Ensembl, and probable GenCode.

    I think that answers your question.
    If it doesn't, I'll let someone else try.
    And, I won't delete the post this time.

    Leave a comment:


  • memeri
    replied
    Thanks, blancha. These are, of course, browser based methods -- one gene at a time, butnice to know!

    Leave a comment:


  • memeri
    replied
    Thank you, mastal.
    I was hoping for something a bit higher level than this, though. If I want to compile an up-to-date list of all annotated exons mapped to the current human reference genome assembly, I need to search through all 468 '...genomic.gbff' files in the 'vertebrate_mammalian' directory. In a somewhat random search of about 20 of these files, I didn't come across one with an entry from the current build (GRCh38.p2), and I don't see an index that may help me limit my search. I suspect that there may be a more direct route to this information; I'm sure NCBI does not search through all this every time they serve up a gene table.

    Leave a comment:


  • mastal
    replied
    See the ftp downloads site:

    ftp://ftp.ncbi.nlm.nih.gov/refseq/release/

    Leave a comment:


  • memeri
    started a topic refSeq gene tables -- all in one file?

    refSeq gene tables -- all in one file?

    Does NCBI have a single, downloadable file containing all refSeq-annotated exons for all genes? This information is available on a browser, one gene at a time, for example:

    I'd like to maintain a local database with this information.

Latest Articles

Collapse

  • seqadmin
    Choosing Between NGS and qPCR
    by seqadmin



    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
    10-18-2024, 07:11 AM
  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin




    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    Nobel Prize for MicroRNA Discovery
    This week,...
    10-07-2024, 08:07 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 11-01-2024, 06:09 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-30-2024, 05:31 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-24-2024, 06:58 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-23-2024, 08:43 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Working...
X