Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ftp NCBI nucleotide database

    Hi,

    I tried to retrieve 400+ bacterial 16S rDNA sequences. I found them in the website.

    How to find the same information in the ftp site? I can't seem to find the so-called nucleotide (or nuccore) database there.

    I was hoping the download the 400+ files in batch and process them.

    thanks,
    John

  • #2
    Which 400 sequences are you referring to (perhaps linkout to SILVA SSU db)? The link you included is only for a single 16S sequence.

    Comment


    • #3
      the genbank accession numbers: JN713151-JN713566

      It is not silva database.

      Originally posted by GenoMax View Post
      Which 400 sequences are you referring to (perhaps linkout to SILVA SSU db)? The link you included is only for a single 16S sequence.

      Comment


      • #4
        Do you have access to nt blast database and blast+ programs? You can easily write a loop to extract the sequences using the following command (let me know if you need more details).
        Code:
        $ blastdbcmd -entry JN713152 -db /path_to/nt -outfmt '%f'
        Code:
        for i in {713151..713566}; do blastdbcmd -entry JN$i -db /path_to/nt -outfmt '%f' >> filename.fa; done
        You can also use e-utilities (http://www.ncbi.nlm.nih.gov/books/NBK25500/) something along the lines of the example here: http://www.ncbi.nlm.nih.gov/books/NB...trieving_large
        Last edited by GenoMax; 07-22-2015, 07:57 AM.

        Comment


        • #5
          Hi GenoMax,

          thank you very much for sharing that. I am going to ask our admin to install the database and BLAST+ and give it a try.

          Will read the e-utilities too. I will get back to you.

          Best,
          John

          Comment


          • #6
            Hi GenoMax,

            Thanks for the tip. I got the db and it works -- got the fasta sequences! Now I have another question on how to retrieve the information under 'organism' tag. For example,
            in this JN713151, I would also like to get the bacterial lineage (in red) for each query id. I tried many specifiers in the -help document, none has worked so far. Any thoughts?

            John

            LOCUS JN713151 1526 bp DNA linear ENV 09-MAY-2012
            DEFINITION Filifactor alocis canine oral taxon 001 clone OB017 16S ribosomal
            RNA gene, partial sequence.
            ACCESSION JN713151
            VERSION JN713151.1 GI:373279114
            KEYWORDS ENV.
            SOURCE Filifactor alocis
            ORGANISM Filifactor alocis
            Bacteria; Firmicutes; Clostridia; Clostridiales;
            Peptostreptococcaceae; Filifactor.

            Comment


            • #7
              You won't find that information in the blast database. Here is one way (I am sure there are others):

              Save the following code in a file (e.g. retr.sh).

              Code:
              j=713151;
              while [ $j -le 713567 ]
              do
                 num=`printf "JN%06d" $j`;
                 curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${num}&rettype=genbank"
                 j=$((j+1))
              done
              Add execute permissions to that file

              Code:
              $ chmod u+x retr.sh
              Run the file to download genbank records.

              Code:
              $ ./retr.sh > genbank_records
              Extract the ID information you need into a file (id_you_need).
              Code:
              $ grep -e JN -e OrgName_lineage genbank_records | sed 's/<Textseq-id_accession>//' | sed 's/<\/Textseq-id_accession>//' | sed 's/<OrgName_lineage>//' | sed 's/<\/OrgName_lineage>//' > id_you_need
              Last edited by GenoMax; 07-28-2015, 04:57 PM.

              Comment


              • #8
                Hi GenoMax,

                thank you very much for your help and code. It works very nicely!

                Just curious -- when the web download was extracted for locus id and lineage using grep, I found the locus id went beyond 713567, to all the way 713709 (in red below). Interestingly, those beyond 713566 were all human HIV virus lineage, not bacterial. I thought we only download
                while [ $j -le 713567 ]? But anyway, it is easily cleaned up, not an issue, just wonder.

                John

                ***
                JN713566
                Lachnospiraceae bacterium canine oral taxon 399 clone 1K033 16S ribosomal RNA gene, partial sequence</Seqdesc_title>
                Bacteria; Firmicutes; Clostridia; Clostridiales; Lachnospiraceae
                Human immunodeficiency virus 1 pol protein (pol) gene, partial cds.</Seqdesc_title>
                Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
                JN713567
                HIV-1 isolate HIV_PRRT_PJ01967_1 from Dominican Republic pol protein (pol) gene, partial cds.</Seqdesc_title>
                pol protein [Human immunodeficiency virus 1]</Seqdesc_title>
                Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
                JN713568
                HIV-1 isolate HIV_PRRT_PJ01967_2 from Dominican Republic pol protein (pol) gene, partial cds.</Seqdesc_title>
                pol protein [Human immunodeficiency virus 1]</Seqdesc_title>
                Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
                JN713569
                HIV-1 isolate HIV_PRRT_PJ01967_3 from Dominican Republic pol protein (pol) gene, partial cds.</Seqdesc_title>
                pol protein [Human immunodeficiency virus 1]</Seqdesc_title>
                Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
                JN713570

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 08:47 AM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                57 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Working...
                X