I have a 47GB file to parse. The sequences are in following format:
>TSCS_00041 gene0EA_12345_rframe2_ORF
MLAATHYYKFAIRRLFPLLKDTICASYSISIKHHENFMALSNMPKIWEDVEVDGNNMQWTRFQTTPVMPVYFIAAGVFNLSFITNWNTKLLYRKDILPYMTFAYNVAKNIAWFLSHIRKTKITNHI
>TSCS_00044 gene0EA_12341_rframe2_ORF
MTICASYSISIKHHENFMAIKHHENFMALSNMPKIWEDV
I simply want to format this file like:
>TSCS_00041
MLAATHYYKFAIRRLFPLLKDTICASYSISIKHHENFMALSNMPKIWEDVEVDGNNMQWTRFQTTPVMPVYFIAAGVFNLSFITNWNTKLLYRKDILPYMTFAYNVAKNIAWFLSHIRKTKITNHI
>TSCS_00044
MTICASYSISIKHHENFMAIKHHENFMALSNMPKIWEDV
Could anyone share the script.
>TSCS_00041 gene0EA_12345_rframe2_ORF
MLAATHYYKFAIRRLFPLLKDTICASYSISIKHHENFMALSNMPKIWEDVEVDGNNMQWTRFQTTPVMPVYFIAAGVFNLSFITNWNTKLLYRKDILPYMTFAYNVAKNIAWFLSHIRKTKITNHI
>TSCS_00044 gene0EA_12341_rframe2_ORF
MTICASYSISIKHHENFMAIKHHENFMALSNMPKIWEDV
I simply want to format this file like:
>TSCS_00041
MLAATHYYKFAIRRLFPLLKDTICASYSISIKHHENFMALSNMPKIWEDVEVDGNNMQWTRFQTTPVMPVYFIAAGVFNLSFITNWNTKLLYRKDILPYMTFAYNVAKNIAWFLSHIRKTKITNHI
>TSCS_00044
MTICASYSISIKHHENFMAIKHHENFMALSNMPKIWEDV
Could anyone share the script.
Comment