Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to get FASTA sequences from GI number

    Hey guys I need help

    I need to download a large amount of FASTA sequences from a set of GI number.

    Is there any script to do this??

    I know I could do it with http://www.ncbi.nlm.nih.gov/sites/batchentrez , but I have too many sequences (and It says to split if they are too many) and I really don't want to do it via browser


    Thank you

  • #2
    Did you mean to cross post this in Biostars? Maybe remove the question from this list or from that list.

    Comment


    • #3
      Originally posted by bt27uk View Post
      Did you mean to cross post this in Biostars? Maybe remove the question from this list or from that list.
      Since I'm blocked with this problem for a couple of days, I tryied to ask in different forums with different people in order to get an answer as soon as possible.

      I can't see the problem.

      If it is contrary to any rules of seqanswer, I'll delete it

      Comment


      • #4
        cross posted on biostars: https://www.biostars.org/p/112410/

        Comment


        • #5
          One way to do this would be using blastdbcmd command that is part of the "blast" suite. You will need to have access to (or download the nt blast database indexes).

          You can put a list of your GI numbers in a file like so (one per line):

          Code:
          $ more gi_list.txt 
          4
          7
          78
          324
          
          $ blastdbcmd -entry_batch gi_list.txt -db /path_to/nt -outfmt "%f" -out seq_fasta_filename.fa

          Comment


          • #6
            Originally posted by GenoMax View Post
            One way to do this would be using blastdbcmd command that is part of the "blast" suite. You will need to have access to (or download the nt blast database indexes).

            You can put a list of your GI numbers in a file like so (one per line):

            Code:
            $ more gi_list.txt 
            4
            7
            78
            324
            
            $ blastdbcmd -entry_batch gi_list.txt -db /path_to/nt -outfmt "%f" -out seq_fasta_filename.fa
            Thank you very much.

            It is exactly what I was looking for!!!

            Comment


            • #7
              Originally posted by fefe89 View Post
              Since I'm blocked with this problem for a couple of days, I tryied to ask in different forums with different people in order to get an answer as soon as possible.

              I can't see the problem.

              If it is contrary to any rules of seqanswer, I'll delete it
              As has been said before it creates more work for folks who are answering the questions.

              It is ok to cross-post but please close your post out on all forums (cross-referencing the solution, once you find one that you like).

              Comment


              • #8
                Originally posted by GenoMax View Post
                As has been said before it creates more work for folks who are answering the questions.

                It is ok to cross-post but please close your post out on all forums (cross-referencing the solution, once you find one that you like).
                OK. The other post has been already closed.

                Comment


                • #9
                  Originally posted by fefe89 View Post
                  I need to download a large amount of FASTA sequences from a set of GI number.

                  Is there any script to do this??
                  The recommended way to do this is with Eutils. Eutils is a Web-service offert by the NCBI.

                  There already exist several threads about using Eutils as well in this forum as in Biostars.

                  Comment


                  • #10
                    Originally posted by GenoMax View Post
                    One way to do this would be using blastdbcmd command that is part of the "blast" suite. You will need to have access to (or download the nt blast database indexes).

                    You can put a list of your GI numbers in a file like so (one per line):

                    Code:
                    $ more gi_list.txt 
                    4
                    7
                    78
                    324
                    
                    $ blastdbcmd -entry_batch gi_list.txt -db /path_to/nt -outfmt "%f" -out seq_fasta_filename.fa
                    Hi !

                    I'm beginner in bioinformatics (and new on the forum) and I have the same problem as fefe89. Your answer (here above) seems totally appropriate for my problem but I have a very naive question (seems simple but I don't find an adequate answer on google) : how can I use the blastdbcmd command line if I don't want to download the (heavy) nt databases on my own computer ? Or am I forced to download the nt locally before running the command line ?

                    Thank you in advance for your understanding (certainly a newbies question...)

                    Comment


                    • #11
                      Originally posted by ericaf View Post
                      Hi !
                      how can I use the blastdbcmd command line if I don't want to download the (heavy) nt databases on my own computer ? Or am I forced to download the nt locally before running the command line ?

                      Thank you in advance for your understanding (certainly a newbies question...)
                      If you don't want to download the blast database locally take look at the NCBI e-utils option (referred to in one of the posts above). You will need to do some additional work to create the right query URL's.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 11:49 AM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 08:47 AM
                      0 responses
                      16 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      61 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      60 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X