Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ftp NCBI nucleotide database

    Hi,

    I tried to retrieve 400+ bacterial 16S rDNA sequences. I found them in the website.

    How to find the same information in the ftp site? I can't seem to find the so-called nucleotide (or nuccore) database there.

    I was hoping the download the 400+ files in batch and process them.

    thanks,
    John

  • #2
    Which 400 sequences are you referring to (perhaps linkout to SILVA SSU db)? The link you included is only for a single 16S sequence.

    Comment


    • #3
      the genbank accession numbers: JN713151-JN713566

      It is not silva database.

      Originally posted by GenoMax View Post
      Which 400 sequences are you referring to (perhaps linkout to SILVA SSU db)? The link you included is only for a single 16S sequence.

      Comment


      • #4
        Do you have access to nt blast database and blast+ programs? You can easily write a loop to extract the sequences using the following command (let me know if you need more details).
        Code:
        $ blastdbcmd -entry JN713152 -db /path_to/nt -outfmt '%f'
        Code:
        for i in {713151..713566}; do blastdbcmd -entry JN$i -db /path_to/nt -outfmt '%f' >> filename.fa; done
        You can also use e-utilities (http://www.ncbi.nlm.nih.gov/books/NBK25500/) something along the lines of the example here: http://www.ncbi.nlm.nih.gov/books/NB...trieving_large
        Last edited by GenoMax; 07-22-2015, 07:57 AM.

        Comment


        • #5
          Hi GenoMax,

          thank you very much for sharing that. I am going to ask our admin to install the database and BLAST+ and give it a try.

          Will read the e-utilities too. I will get back to you.

          Best,
          John

          Comment


          • #6
            Hi GenoMax,

            Thanks for the tip. I got the db and it works -- got the fasta sequences! Now I have another question on how to retrieve the information under 'organism' tag. For example,
            in this JN713151, I would also like to get the bacterial lineage (in red) for each query id. I tried many specifiers in the -help document, none has worked so far. Any thoughts?

            John

            LOCUS JN713151 1526 bp DNA linear ENV 09-MAY-2012
            DEFINITION Filifactor alocis canine oral taxon 001 clone OB017 16S ribosomal
            RNA gene, partial sequence.
            ACCESSION JN713151
            VERSION JN713151.1 GI:373279114
            KEYWORDS ENV.
            SOURCE Filifactor alocis
            ORGANISM Filifactor alocis
            Bacteria; Firmicutes; Clostridia; Clostridiales;
            Peptostreptococcaceae; Filifactor.

            Comment


            • #7
              You won't find that information in the blast database. Here is one way (I am sure there are others):

              Save the following code in a file (e.g. retr.sh).

              Code:
              j=713151;
              while [ $j -le 713567 ]
              do
                 num=`printf "JN%06d" $j`;
                 curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${num}&rettype=genbank"
                 j=$((j+1))
              done
              Add execute permissions to that file

              Code:
              $ chmod u+x retr.sh
              Run the file to download genbank records.

              Code:
              $ ./retr.sh > genbank_records
              Extract the ID information you need into a file (id_you_need).
              Code:
              $ grep -e JN -e OrgName_lineage genbank_records | sed 's/<Textseq-id_accession>//' | sed 's/<\/Textseq-id_accession>//' | sed 's/<OrgName_lineage>//' | sed 's/<\/OrgName_lineage>//' > id_you_need
              Last edited by GenoMax; 07-28-2015, 04:57 PM.

              Comment


              • #8
                Hi GenoMax,

                thank you very much for your help and code. It works very nicely!

                Just curious -- when the web download was extracted for locus id and lineage using grep, I found the locus id went beyond 713567, to all the way 713709 (in red below). Interestingly, those beyond 713566 were all human HIV virus lineage, not bacterial. I thought we only download
                while [ $j -le 713567 ]? But anyway, it is easily cleaned up, not an issue, just wonder.

                John

                ***
                JN713566
                Lachnospiraceae bacterium canine oral taxon 399 clone 1K033 16S ribosomal RNA gene, partial sequence</Seqdesc_title>
                Bacteria; Firmicutes; Clostridia; Clostridiales; Lachnospiraceae
                Human immunodeficiency virus 1 pol protein (pol) gene, partial cds.</Seqdesc_title>
                Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
                JN713567
                HIV-1 isolate HIV_PRRT_PJ01967_1 from Dominican Republic pol protein (pol) gene, partial cds.</Seqdesc_title>
                pol protein [Human immunodeficiency virus 1]</Seqdesc_title>
                Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
                JN713568
                HIV-1 isolate HIV_PRRT_PJ01967_2 from Dominican Republic pol protein (pol) gene, partial cds.</Seqdesc_title>
                pol protein [Human immunodeficiency virus 1]</Seqdesc_title>
                Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
                JN713569
                HIV-1 isolate HIV_PRRT_PJ01967_3 from Dominican Republic pol protein (pol) gene, partial cds.</Seqdesc_title>
                pol protein [Human immunodeficiency virus 1]</Seqdesc_title>
                Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
                JN713570

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Genetic Variation in Immunogenetics and Antibody Diversity
                  by seqadmin



                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                  Yesterday, 07:24 PM
                • seqadmin
                  Choosing Between NGS and qPCR
                  by seqadmin



                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                  10-18-2024, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 11-01-2024, 06:09 AM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-30-2024, 05:31 AM
                0 responses
                21 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-24-2024, 06:58 AM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-23-2024, 08:43 AM
                0 responses
                56 views
                0 likes
                Last Post seqadmin  
                Working...
                X