Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • asperjelly
    Member
    • Jan 2013
    • 11

    Help with formating fasta headers

    Hi all,

    I'm fairly new at all of this - trained as a benchtop scientist and learning the bioinformatics on the fly, so sorry if this is a relatively naive question. But I need some help re-formatting fasta headers in a multi-fasta file. I would like to change the headers that look like this:

    >ref|O74455|ID=6PGL_SCHPO|MODRES=|NCBITAXID=284812|Probable 6-phosphogluconolactonase (6PGL)
    MSVYSFSDVSLVAKALGAFVKEKSEASIKRHGVFTLALSGGSLPKVLAEGLAQQRGIEFS

    Into the format for blast2go which is >ref|UniqueID|SeqDescription. So I want the above to look like this:

    >ref|O74455|Probable 6-phosphogluconolactonase (6PGL)
    MSVYSFSDVSLVAKALGAFVKEKSEASIKRHGVFTLALSGGSLPKVLAEGLAQQRGIEFS

    I'm assuming something like sed can do this relatively quickly, I just can't seem to figure out how to remove the fields in-between. Thanks in advance for any help you can offer.

    - Jon
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    See if this works.

    Code:
    $ cut -f1,2,6 -d"|" inputfile > outputfile

    Comment

    • asperjelly
      Member
      • Jan 2013
      • 11

      #3
      fantastic - thank you. I knew there had to be a simple solution.

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      11 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      23 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      28 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      22 views
      0 reactions
      Last Post SEQadmin2  
      Working...