Seqanswers Leaderboard Ad

**vkkodali** · 02-04-2019, 03:24 AM

In general, for downloading NCBI data from the Unix command line, I recommend using Entrez Direct.

Specifically, to download the runinfo table, you can use the following command:

Code:

esearch -db sra -q 'PRJNA308986' | efetch -format runinfo

This will produce a comma separated table with the following fields:

Code:

                  Run [  1]: SRR3108728
          ReleaseDate [  2]: 2017-02-16 00:00:00
             LoadDate [  3]: 2016-01-21 03:15:18
                spots [  4]: 98100
                bases [  5]: 49246200
     spots_with_mates [  6]: 98100
            avgLength [  7]: 502
              size_MB [  8]: 28
         AssemblyName [  9]: 
        download_path [ 10]: https://sra-download.ncbi.nlm.nih.gov/traces/sra37/SRR/003035/SRR3108728
           Experiment [ 11]: SRX1537041
          LibraryName [ 12]: mdbk110
      LibraryStrategy [ 13]: AMPLICON
     LibrarySelection [ 14]: PCR
        LibrarySource [ 15]: METAGENOMIC
        LibraryLayout [ 16]: PAIRED
           InsertSize [ 17]: 0
            InsertDev [ 18]: 0
             Platform [ 19]: ILLUMINA
                Model [ 20]: Illumina MiSeq
             SRAStudy [ 21]: SRP068618
           BioProject [ 22]: PRJNA308986
      Study_Pubmed_id [ 23]: 
            ProjectID [ 24]: 308986
               Sample [ 25]: SRS1253892
            BioSample [ 26]: SAMN04419133
           SampleType [ 27]: simple
                TaxID [ 28]: 410658
       ScientificName [ 29]: soil metagenome
           SampleName [ 30]: mdbk110
         g1k_pop_code [ 31]: 
               source [ 32]: 
   g1k_analysis_group [ 33]: 
           Subject_ID [ 34]: 
                  Sex [ 35]: 
              Disease [ 36]: 
                Tumor [ 37]: no
     Affection_Status [ 38]: 
         Analyte_Type [ 39]: 
    Histological_Type [ 40]: 
            Body_Site [ 41]: 
           CenterName [ 42]: UNIVERSITY OF MINNESOTA
           Submission [ 43]: SRA336468
dbgap_study_accession [ 44]: 
              Consent [ 45]: public
              RunHash [ 46]: 4B63AAF2295927A2EAEB798FCF9FC7DA
             ReadHash [ 47]: FB1226CB8B5FEBC85B053718D4C1BBFA

You can download the same table in XML format by making a small change as follows:

Code:

esearch -db sra -q 'PRJNA308986' | efetch -format runinfo -mode xml

You can then parse this XML using the command "xtract" that comes with the Entrez Direct tools to extract only specific columns of interest to you.

**vkkodali** · 02-04-2019, 05:58 AM

In general, for downloading NCBI data from the Unix command line, I recommend using Entrez Direct.

Specifically, to download the runinfo table, you can use the following command:

Code:

esearch -db sra -q 'PRJNA308986' | efetch -format runinfo

This will produce a comma separated table with the following fields:

Code:

                  Run [  1]: SRR3108728
          ReleaseDate [  2]: 2017-02-16 00:00:00
             LoadDate [  3]: 2016-01-21 03:15:18
                spots [  4]: 98100
                bases [  5]: 49246200
     spots_with_mates [  6]: 98100
            avgLength [  7]: 502
              size_MB [  8]: 28
         AssemblyName [  9]: 
        download_path [ 10]: https://sra-download.ncbi.nlm.nih.gov/traces/sra37/SRR/003035/SRR3108728
           Experiment [ 11]: SRX1537041
          LibraryName [ 12]: mdbk110
      LibraryStrategy [ 13]: AMPLICON
     LibrarySelection [ 14]: PCR
        LibrarySource [ 15]: METAGENOMIC
        LibraryLayout [ 16]: PAIRED
           InsertSize [ 17]: 0
            InsertDev [ 18]: 0
             Platform [ 19]: ILLUMINA
                Model [ 20]: Illumina MiSeq
             SRAStudy [ 21]: SRP068618
           BioProject [ 22]: PRJNA308986
      Study_Pubmed_id [ 23]: 
            ProjectID [ 24]: 308986
               Sample [ 25]: SRS1253892
            BioSample [ 26]: SAMN04419133
           SampleType [ 27]: simple
                TaxID [ 28]: 410658
       ScientificName [ 29]: soil metagenome
           SampleName [ 30]: mdbk110
         g1k_pop_code [ 31]: 
               source [ 32]: 
   g1k_analysis_group [ 33]: 
           Subject_ID [ 34]: 
                  Sex [ 35]: 
              Disease [ 36]: 
                Tumor [ 37]: no
     Affection_Status [ 38]: 
         Analyte_Type [ 39]: 
    Histological_Type [ 40]: 
            Body_Site [ 41]: 
           CenterName [ 42]: UNIVERSITY OF MINNESOTA
           Submission [ 43]: SRA336468
dbgap_study_accession [ 44]: 
              Consent [ 45]: public
              RunHash [ 46]: 4B63AAF2295927A2EAEB798FCF9FC7DA
             ReadHash [ 47]: FB1226CB8B5FEBC85B053718D4C1BBFA

You can download the same table in XML format by making a small change as follows:

Code:

esearch -db sra -q 'PRJNA308986' | efetch -format runinfo -mode xml

You can then parse this XML using the command "xtract" that comes with the Entrez Direct tools to extract only specific columns of interest to you.

**tjtaylor** · 05-21-2020, 11:28 AM

Using wget to retrieve SRA RunInfo and AccList

Here's an example of using `wget` to retrieve the SRA RunInfo and AccList from NCBI Sequence Read Archive.

Code:

# wget equivalent to:
#   esearch -db sra -q "${study_id}" | efetch -format runinfo

study_id=PRJNA308986
db=sra

#assemble the esearch URL
base='https://eutils.ncbi.nlm.nih.gov/entrez/eutils/'

# esearch for the project, using WebEnv/QueryKey for efetch
data="`wget -qO- "${base}esearch.fcgi?db=${db}&term=${study_id}&usehistory=y"`"
web=$(grep -oPm1 "(?<=<WebEnv>)[^<]+" <<< "${data}")
key=$(grep -oPm1 "(?<=<QueryKey>)[^<]+" <<< "${data}")

# efetch SRA RunInfo
wget -qO "SraRunInfo-${study_id}.csv" "${base}efetch.fcgi?db=${db}&query_key=${key}&WebEnv=${web}&retmode=text&rettype=runinfo"

# efetch SRA AccList
wget -qO "SraAccList-${study_id}.txt" "${base}efetch.fcgi?db=${db}&query_key=${key}&WebEnv=${web}&retmode=text&rettype=acclist"

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Downloading 'RunInfo Table' from SRA Run Selector

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News