Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • .gb to .ffa by using R Bioconductor

    Hi everyone,

    Experimentalist trying to get introduced to some basic bioinformatics, I've recently started to use R Bioconductor, but getting quite lost yet.
    I'm would like to use this new skills to convert a Genebank file (.gbk) downloaded from ncbi, to a multifasta file containing all genes, nucleotides (.ffn), so it would be great if any now can give me the exact script I have to use:

    1) Import the .gb file
    2) transform .bg to .ffn
    3) Export .ffn to tab file (so I can open it using textedit)

    I know is kind of very basics, but thank you in advance.

    Cristina

  • #2
    This is easier in biopython:

    Code:
    #!/usr/bin/env python
    from Bio import SeqIO
    outfile = open("output.fa", "w") #Change the output name
    for record in SeqIO.read("somefile.gb", "genbank") : #Change the input name
        SeqIO.write(record, outfile, "fasta")
    outfile.close()
    Something like that should work.

    Comment


    • #3
      Hi Ryan, thanks for you answer… but I don't know how to work in Python.
      Should I tape in the Linux terminal?


      Anyway I think I found this script anywhere else, but this is for transforming to "fasta" and I need "ffn".

      Comment


      • #4
        Yes, though you'll need to install the biopython package on your computer as well as just python, which should already be there. If you're typing things in manually, then you can skip the first line (#!/usr/bin/env python). You'll find a basic fluency on python to be rather important in bioinformatics (knowing python and R is sufficient for most things).

        Comment


        • #5
          BTW, ffn is fasta.

          Comment


          • #6
            I know "ffn" is fasta… :-) but multifasta.
            As fas as I understood the fasta format output in python is a unique sequence, the whole genome, from the genbank file, but ffn contains multiple sequences from all genes in the genome.
            Am I right?

            Comment


            • #7
              Python will happily write a multifasta file

              Comment


              • #8
                Really? Oh, that's good then.
                I will look to use python.
                Thanks a lot Ryan!

                Comment


                • #9
                  See also http://www.warwick.ac.uk/go/peter_co...genbank2fasta/ which discusses converting GenBank to FASTA with Biopython in more detail - including how to pull out the gene sequences.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 11:49 AM
                  0 responses
                  15 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-24-2024, 08:47 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  61 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X