Hy Everybody,
it's my first post. I want to thanks you from the italian scientific community for your wonderful work in this forum.
I have a question to ask you. I'm a 454 user and i have a file with singleton sequences generated with 'sfffile' and converted in "singleton.fna". I try to put this file in "mirTools" but it doesn't accept this file. So i modified the file like this:
>sample_1_x58
ACAGGCGGACACACACACACACACACACACACACACACACACACACTACAACACAGTA
>sample_2_x160
CTCACACAGTTATACACAGATTTACACACACAATACACCTACACACACATGCTTATACAC
ACCACTCAACACAAGTTACCTAACTACAATAATTATC
>GIK1EHM01D82G8_x145
GTTTGAGAGGGTGATGATAAGAAGCTTGCGCAGTGGCTCACGCCTGTAATCCCAGCACTT
TGGGAGGCCAAGGCAGGTGGATTGCCTGAGCTCAGGAGTTTGAGACCAGCCTGAGCAACA
TGGAAAATCCCATCTCTAAAAATAC
>GIK1EHM01A3PID_x74
GGTAACTTTGTGTTGATGTGGGCGAGTGTGGGCAGATGGGAAGCTGTGGTGTGGGGCGAG
TGTGGGCAGATGGG......etc.etc.
and it seems to accept this format. The problem is that the originary file contains more than 400.000 sequences and modify it manually is impossible. Is there any script action like sed or grep to delete all the words following by ">name" without delete all the words in the row? Ex:
from this unaccepted format:
>GIK1EHM01A0TLT length=84 xy=0302_1103 region=1 run=R_2010_06_08_08_48_07_
GTGTTTCTGTGTGGAGGTGTGTCTCTGTGGTGTGTGTGTCTGTGTGGGTGTACGTGTGTC
TCTGTCTGTGGTGTGTGTGTCTGT
to this accepted format
>GIK1EHM01A0TLT_x84
GTGTTTCTGTGTGGAGGTGTGTCTCT...................
I hope you can help me, i'll try all the way. Waiting for your answer
Thank you very much
Giorgio
it's my first post. I want to thanks you from the italian scientific community for your wonderful work in this forum.
I have a question to ask you. I'm a 454 user and i have a file with singleton sequences generated with 'sfffile' and converted in "singleton.fna". I try to put this file in "mirTools" but it doesn't accept this file. So i modified the file like this:
>sample_1_x58
ACAGGCGGACACACACACACACACACACACACACACACACACACACTACAACACAGTA
>sample_2_x160
CTCACACAGTTATACACAGATTTACACACACAATACACCTACACACACATGCTTATACAC
ACCACTCAACACAAGTTACCTAACTACAATAATTATC
>GIK1EHM01D82G8_x145
GTTTGAGAGGGTGATGATAAGAAGCTTGCGCAGTGGCTCACGCCTGTAATCCCAGCACTT
TGGGAGGCCAAGGCAGGTGGATTGCCTGAGCTCAGGAGTTTGAGACCAGCCTGAGCAACA
TGGAAAATCCCATCTCTAAAAATAC
>GIK1EHM01A3PID_x74
GGTAACTTTGTGTTGATGTGGGCGAGTGTGGGCAGATGGGAAGCTGTGGTGTGGGGCGAG
TGTGGGCAGATGGG......etc.etc.
and it seems to accept this format. The problem is that the originary file contains more than 400.000 sequences and modify it manually is impossible. Is there any script action like sed or grep to delete all the words following by ">name" without delete all the words in the row? Ex:
from this unaccepted format:
>GIK1EHM01A0TLT length=84 xy=0302_1103 region=1 run=R_2010_06_08_08_48_07_
GTGTTTCTGTGTGGAGGTGTGTCTCTGTGGTGTGTGTGTCTGTGTGGGTGTACGTGTGTC
TCTGTCTGTGGTGTGTGTGTCTGT
to this accepted format
>GIK1EHM01A0TLT_x84
GTGTTTCTGTGTGGAGGTGTGTCTCT...................
I hope you can help me, i'll try all the way. Waiting for your answer
Thank you very much
Giorgio
Comment