Header Leaderboard Ad

Collapse

Downloading data from ncbi

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Downloading data from ncbi

    Hello
    I've been trying to get the hang of NCBI's esearch suite as I want to download their gene summary paragraphs. Would anyone be able to clarify the correct code format for this to output a file that is of the form

    Gene Summary
    PCSK9 'This gene encodes...'

    If this was for all genes or for a list provided both would work thanks
    Last edited by Cannon; 01-13-2022, 04:30 PM.

  • #2
    Using Entrezdirect:

    Code:
    $ esearch -db gene -query "PCSK9 [GENE] AND human [ORGN]" | efetch -format acc
    
    1. PCSK9
    Official Symbol: PCSK9 and Name: proprotein convertase subtilisin/kexin type 9 [Homo sapiens (human)]
    Other Aliases: FH3, FHCL3, HCHOLA3, LDLCQ1, NARC-1, NARC1, PC9
    Other Designations: proprotein convertase subtilisin/kexin type 9; convertase subtilisin/kexin type 9 preproprotein; neural apoptosis regulated convertase 1; subtilisin/kexin-like protease PC9
    Chromosome: 1; Location: 1p32.3
    Annotation: Chromosome 1 NC_000001.11 (55039548..55064852)
    MIM: 607786
    ID: 255738
    
    2. PCSK9
    Official Symbol: PCSK9 and Name: proprotein convertase subtilisin/kexin type 9 [Homo sapiens (human)]
    Other Aliases: FH3, HCHOLA3, NARC-1, NARC1
    Other Designations: Hypercholesterolemia, familial, 3; hypercholesterolemia, autosomal dominant 3
    Chromosome: 1; Location: 1p34.1-p32
    This record was replaced with GeneID: 255738
    ID: 353175

    Comment


    • #3
      Another variation:

      Code:
      $ esearch -db gene -query "PCSK9 [GENE] AND human [ORGN]" | esummary | xtract -pattern DocumentSummary -element Name,Summary
      PCSK9	This gene encodes a member of the subtilisin-like proprotein convertase family, which includes proteases that process protein and peptide precursors trafficking through regulated or constitutive branches of the secretory pathway. The encoded protein undergoes an autocatalytic processing event with its prosegment in the ER and is constitutively secreted as an inactive protease into the extracellular matrix and trans-Golgi network. It is expressed in liver, intestine and kidney tissues and escorts specific receptors for lysosomal degradation. It plays a role in cholesterol and fatty acid metabolism. Mutations in this gene have been associated with autosomal dominant familial hypercholesterolemia. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Feb 2014]
      For more than one gene put them in a file:

      Code:
      $ more id
      BRCA2
      TP53
      PCSK9
      
      $ for i in `cat id`; do esearch -db gene -query "${i} [GENE] AND human [ORGN]" | esummary | xtract -pattern DocumentSummary -element Name,Summary; done
      BRCA2	Inherited mutations in BRCA1 and this gene, BRCA2, confer increased lifetime risk of developing breast or ovarian cancer. Both BRCA1 and BRCA2 are involved in maintenance of genome stability, specifically the homologous recombination pathway for double-strand DNA repair. The largest exon in both genes is exon 11, which harbors the most important and frequent mutations in breast cancer patients. The BRCA2 gene was found on chromosome 13q12.3 in human. The BRCA2 protein contains several copies of a 70 aa motif called the BRC motif, and these motifs mediate binding to the RAD51 recombinase which functions in DNA repair. BRCA2 is considered a tumor suppressor gene, as tumors with BRCA2 mutations generally exhibit loss of heterozygosity (LOH) of the wild-type allele. [provided by RefSeq, May 2020]
      TP53	This gene encodes a tumor suppressor protein containing transcriptional activation, DNA binding, and oligomerization domains. The encoded protein responds to diverse cellular stresses to regulate expression of target genes, thereby inducing cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. Mutations in this gene are associated with a variety of human cancers, including hereditary cancers such as Li-Fraumeni syndrome. Alternative splicing of this gene and the use of alternate promoters result in multiple transcript variants and isoforms. Additional isoforms have also been shown to result from the use of alternate translation initiation codons from identical transcript variants (PMIDs: 12032546, 20937277). [provided by RefSeq, Dec 2016]
      PCSK9	This gene encodes a member of the subtilisin-like proprotein convertase family, which includes proteases that process protein and peptide precursors trafficking through regulated or constitutive branches of the secretory pathway. The encoded protein undergoes an autocatalytic processing event with its prosegment in the ER and is constitutively secreted as an inactive protease into the extracellular matrix and trans-Golgi network. It is expressed in liver, intestine and kidney tissues and escorts specific receptors for lysosomal degradation. It plays a role in cholesterol and fatty acid metabolism. Mutations in this gene have been associated with autosomal dominant familial hypercholesterolemia. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Feb 2014]
      Last edited by GenoMax; 01-14-2022, 06:47 PM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
        by seqadmin


        ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

        01-24-2023, 01:19 PM
      • seqadmin
        Introduction to Single-Cell Sequencing
        by seqadmin
        Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

        The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
        ...
        01-09-2023, 03:10 PM

      ad_right_rmr

      Collapse
      Working...
      X