Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Biopython - want to get a batch of amino acid fastas from list of entrez gene_ids

    I have a list of Entrez Gene IDs (~100) and I would like to obtain the amino acid fastas of each and create a multi-fasta file.

    I'm trying to do this using the Entrez.efetch function in biopython but I'm not sure how to retrieve the amino acid sequence from the gene file.

    Any ideas?

  • #2
    An easy way to get the sequence is to ask entrez.efetch() to return a FASTA formatted sequence, as described in the Biopython tutorial at http://biopython.org/DIST/docs/tutor...al.html#htoc55 - note the rettype="fasta" argument. You can then treat this as any other FASTA stream (i.e. as if it were a file).

    Comment


    • #3
      That should work perfectly.

      Can biopython convert ids? For example from entrez GeneIDs to protein accession numbers?

      Comment


      • #4
        The NCBI can convert the gene IDs to protein IDs, try Entrez link (elink). See also:

        Comment


        • #5
          Ok so using the tutorial, I developed the following code (using trial and error):

          from Bio import Entrez
          from Bio import SeqIO
          Entrez.email = "my_name@my_website.com"
          id_list = set(open('pids_test.csv', 'rU'))
          handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
          id=id_list)
          for seq_record in SeqIO.parse(handle, "fasta"):
          print ">" + seq_record.id, seq_record.description
          print seq_record.seq
          handle.close()

          this prints exactly what I want. I have two questions:

          1) how can I get the results into a text file, rather than printing them in my output?

          2) how can I let the user specify the input file (command line is fine)?

          K

          Comment


          • #6
            To save the NCBI FASTA formatted data to a file, try something like this:

            Code:
            from Bio import Entrez
            from Bio import SeqIO
            Entrez.email = "my_name@my_website.com"
            id_list = set(open('pids_test.csv', 'rU'))
            handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
            id=id_list)	
            out_handle = open("saved.fasta", "w")
            for line in handle:
                out_handle.write(line)
            out_handle.close()
            handle.close()
            P.S. There a very similar example in the Biopython Tutorial in the section "EFetch: Downloading full records from Entrez"


            If you want to take the filename from the command line, learn about sys.argv, while to prompt the user try the input function or similar. Any good introduction to Python should cover this.
            Last edited by maubp; 01-08-2013, 09:34 AM. Reason: Added link

            Comment


            • #7
              Worked great, added sys.argv to allow user to specify file input and output:


              import sys
              from Bio import Entrez
              from Bio import SeqIO
              Entrez.email = "xxxxxXXXXXxxxxx"
              id_list = set(open(sys.argv[1], 'rU'))
              handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
              id=id_list)
              out_handle = open(sys.argv[2], 'w')

              for line in handle :
              out_handle.write(line)
              out_handle.close()

              handle.close()

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advanced Methods for the Detection of Infectious Disease
                by seqadmin




                The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
                ...
                11-27-2023, 01:15 PM
              • seqadmin
                Strategies for Investigating the Microbiome
                by seqadmin




                Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
                11-09-2023, 07:02 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-01-2023, 09:55 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-30-2023, 10:48 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-29-2023, 08:26 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-29-2023, 08:12 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Working...
              X