Announcement

Collapse
No announcement yet.

.gb to .ffa by using R Bioconductor

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • .gb to .ffa by using R Bioconductor

    Hi everyone,

    Experimentalist trying to get introduced to some basic bioinformatics, I've recently started to use R Bioconductor, but getting quite lost yet.
    I'm would like to use this new skills to convert a Genebank file (.gbk) downloaded from ncbi, to a multifasta file containing all genes, nucleotides (.ffn), so it would be great if any now can give me the exact script I have to use:

    1) Import the .gb file
    2) transform .bg to .ffn
    3) Export .ffn to tab file (so I can open it using textedit)

    I know is kind of very basics, but thank you in advance.

    Cristina

  • #2
    This is easier in biopython:

    Code:
    #!/usr/bin/env python
    from Bio import SeqIO
    outfile = open("output.fa", "w") #Change the output name
    for record in SeqIO.read("somefile.gb", "genbank") : #Change the input name
        SeqIO.write(record, outfile, "fasta")
    outfile.close()
    Something like that should work.

    Comment


    • #3
      Hi Ryan, thanks for you answer… but I don't know how to work in Python.
      Should I tape in the Linux terminal?


      Anyway I think I found this script anywhere else, but this is for transforming to "fasta" and I need "ffn".

      Comment


      • #4
        Yes, though you'll need to install the biopython package on your computer as well as just python, which should already be there. If you're typing things in manually, then you can skip the first line (#!/usr/bin/env python). You'll find a basic fluency on python to be rather important in bioinformatics (knowing python and R is sufficient for most things).

        Comment


        • #5
          BTW, ffn is fasta.

          Comment


          • #6
            I know "ffn" is fasta… :-) but multifasta.
            As fas as I understood the fasta format output in python is a unique sequence, the whole genome, from the genbank file, but ffn contains multiple sequences from all genes in the genome.
            Am I right?

            Comment


            • #7
              Python will happily write a multifasta file

              Comment


              • #8
                Really? Oh, that's good then.
                I will look to use python.
                Thanks a lot Ryan!

                Comment


                • #9
                  See also http://www.warwick.ac.uk/go/peter_co...genbank2fasta/ which discusses converting GenBank to FASTA with Biopython in more detail - including how to pull out the gene sequences.

                  Comment

                  Working...
                  X