Hi all,
I'm fairly new at all of this - trained as a benchtop scientist and learning the bioinformatics on the fly, so sorry if this is a relatively naive question. But I need some help re-formatting fasta headers in a multi-fasta file. I would like to change the headers that look like this:
>ref|O74455|ID=6PGL_SCHPO|MODRES=|NCBITAXID=284812|Probable 6-phosphogluconolactonase (6PGL)
MSVYSFSDVSLVAKALGAFVKEKSEASIKRHGVFTLALSGGSLPKVLAEGLAQQRGIEFS
Into the format for blast2go which is >ref|UniqueID|SeqDescription. So I want the above to look like this:
>ref|O74455|Probable 6-phosphogluconolactonase (6PGL)
MSVYSFSDVSLVAKALGAFVKEKSEASIKRHGVFTLALSGGSLPKVLAEGLAQQRGIEFS
I'm assuming something like sed can do this relatively quickly, I just can't seem to figure out how to remove the fields in-between. Thanks in advance for any help you can offer.
- Jon
I'm fairly new at all of this - trained as a benchtop scientist and learning the bioinformatics on the fly, so sorry if this is a relatively naive question. But I need some help re-formatting fasta headers in a multi-fasta file. I would like to change the headers that look like this:
>ref|O74455|ID=6PGL_SCHPO|MODRES=|NCBITAXID=284812|Probable 6-phosphogluconolactonase (6PGL)
MSVYSFSDVSLVAKALGAFVKEKSEASIKRHGVFTLALSGGSLPKVLAEGLAQQRGIEFS
Into the format for blast2go which is >ref|UniqueID|SeqDescription. So I want the above to look like this:
>ref|O74455|Probable 6-phosphogluconolactonase (6PGL)
MSVYSFSDVSLVAKALGAFVKEKSEASIKRHGVFTLALSGGSLPKVLAEGLAQQRGIEFS
I'm assuming something like sed can do this relatively quickly, I just can't seem to figure out how to remove the fields in-between. Thanks in advance for any help you can offer.
- Jon
Comment