I've recently started doing some sequencing analysis and as result i've built a Linux box runing Ubuntu 12.04 with a cuda capable card. Now long story short I've managed to successfully align sequences using both meme and cuda-meme using a practice data set. This data set was supplied by a colleague who is currently away otherwise I'd be going to him for answers.
Because I've grown accustom to using meme/cuda-meme I though I'd take a wack and analysis the raw file which is in a fasta file. The file is based on a long string of sequences which containing primers at either end with a random region in the centre. I've removed the primers, and now i've identify out of this mess a collection 37278 sequence, however within this list several sequence occur several times.
Now the problem is the PC I'm currently testing this process on isn't powerful, and as a results I want to filter the data so that >seq_example which occurs the most times is at the time of a list, with an additional identify in the time telling me how many times its occur, but then to delete the duplicate.
Is there any way using a mixture of programs from linux/windows where I can sort the data like this?
Because I've grown accustom to using meme/cuda-meme I though I'd take a wack and analysis the raw file which is in a fasta file. The file is based on a long string of sequences which containing primers at either end with a random region in the centre. I've removed the primers, and now i've identify out of this mess a collection 37278 sequence, however within this list several sequence occur several times.
Now the problem is the PC I'm currently testing this process on isn't powerful, and as a results I want to filter the data so that >seq_example which occurs the most times is at the time of a list, with an additional identify in the time telling me how many times its occur, but then to delete the duplicate.
Is there any way using a mixture of programs from linux/windows where I can sort the data like this?
Comment