Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to get FASTA sequences from GI number

    Hey guys I need help

    I need to download a large amount of FASTA sequences from a set of GI number.

    Is there any script to do this??

    I know I could do it with http://www.ncbi.nlm.nih.gov/sites/batchentrez , but I have too many sequences (and It says to split if they are too many) and I really don't want to do it via browser


    Thank you

  • #2
    Did you mean to cross post this in Biostars? Maybe remove the question from this list or from that list.

    Comment


    • #3
      Originally posted by bt27uk View Post
      Did you mean to cross post this in Biostars? Maybe remove the question from this list or from that list.
      Since I'm blocked with this problem for a couple of days, I tryied to ask in different forums with different people in order to get an answer as soon as possible.

      I can't see the problem.

      If it is contrary to any rules of seqanswer, I'll delete it

      Comment


      • #4
        cross posted on biostars: https://www.biostars.org/p/112410/

        Comment


        • #5
          One way to do this would be using blastdbcmd command that is part of the "blast" suite. You will need to have access to (or download the nt blast database indexes).

          You can put a list of your GI numbers in a file like so (one per line):

          Code:
          $ more gi_list.txt 
          4
          7
          78
          324
          
          $ blastdbcmd -entry_batch gi_list.txt -db /path_to/nt -outfmt "%f" -out seq_fasta_filename.fa

          Comment


          • #6
            Originally posted by GenoMax View Post
            One way to do this would be using blastdbcmd command that is part of the "blast" suite. You will need to have access to (or download the nt blast database indexes).

            You can put a list of your GI numbers in a file like so (one per line):

            Code:
            $ more gi_list.txt 
            4
            7
            78
            324
            
            $ blastdbcmd -entry_batch gi_list.txt -db /path_to/nt -outfmt "%f" -out seq_fasta_filename.fa
            Thank you very much.

            It is exactly what I was looking for!!!

            Comment


            • #7
              Originally posted by fefe89 View Post
              Since I'm blocked with this problem for a couple of days, I tryied to ask in different forums with different people in order to get an answer as soon as possible.

              I can't see the problem.

              If it is contrary to any rules of seqanswer, I'll delete it
              As has been said before it creates more work for folks who are answering the questions.

              It is ok to cross-post but please close your post out on all forums (cross-referencing the solution, once you find one that you like).

              Comment


              • #8
                Originally posted by GenoMax View Post
                As has been said before it creates more work for folks who are answering the questions.

                It is ok to cross-post but please close your post out on all forums (cross-referencing the solution, once you find one that you like).
                OK. The other post has been already closed.

                Comment


                • #9
                  Originally posted by fefe89 View Post
                  I need to download a large amount of FASTA sequences from a set of GI number.

                  Is there any script to do this??
                  The recommended way to do this is with Eutils. Eutils is a Web-service offert by the NCBI.

                  There already exist several threads about using Eutils as well in this forum as in Biostars.

                  Comment


                  • #10
                    Originally posted by GenoMax View Post
                    One way to do this would be using blastdbcmd command that is part of the "blast" suite. You will need to have access to (or download the nt blast database indexes).

                    You can put a list of your GI numbers in a file like so (one per line):

                    Code:
                    $ more gi_list.txt 
                    4
                    7
                    78
                    324
                    
                    $ blastdbcmd -entry_batch gi_list.txt -db /path_to/nt -outfmt "%f" -out seq_fasta_filename.fa
                    Hi !

                    I'm beginner in bioinformatics (and new on the forum) and I have the same problem as fefe89. Your answer (here above) seems totally appropriate for my problem but I have a very naive question (seems simple but I don't find an adequate answer on google) : how can I use the blastdbcmd command line if I don't want to download the (heavy) nt databases on my own computer ? Or am I forced to download the nt locally before running the command line ?

                    Thank you in advance for your understanding (certainly a newbies question...)

                    Comment


                    • #11
                      Originally posted by ericaf View Post
                      Hi !
                      how can I use the blastdbcmd command line if I don't want to download the (heavy) nt databases on my own computer ? Or am I forced to download the nt locally before running the command line ?

                      Thank you in advance for your understanding (certainly a newbies question...)
                      If you don't want to download the blast database locally take look at the NCBI e-utils option (referred to in one of the posts above). You will need to do some additional work to create the right query URL's.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Non-Coding RNA Research and Technologies
                        by seqadmin


                        Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                        [Article Coming Soon!]...
                        Today, 08:07 AM
                      • seqadmin
                        Recent Developments in Metagenomics
                        by seqadmin





                        Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                        09-23-2024, 06:35 AM
                      • seqadmin
                        Understanding Genetic Influence on Infectious Disease
                        by seqadmin




                        During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                        Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                        09-09-2024, 10:59 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 10-02-2024, 04:51 AM
                      0 responses
                      14 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-01-2024, 07:10 AM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 09-30-2024, 08:33 AM
                      1 response
                      31 views
                      0 likes
                      Last Post EmiTom
                      by EmiTom
                       
                      Started by seqadmin, 09-26-2024, 12:57 PM
                      0 responses
                      19 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X