does anyone know a good way to analyze the repeat sequences, ie, those that don't align using eland in illumina pipleine? I think that there are some interesting biological aspects to the sequences that are not unique in my dataset and would like to learn about them.
I'll throw out there what I have in mind, I'd like to upload the export.txt file to galaxy, and then group and count the most common sequence tags in the export file, then blat/blast search the most common tags to see what they are (ie, satellite, line, sine, etc.) My other problem is that I am unable to upload the export.txt file to galaxy. I assume it must be compressed, does anyone know anything about an upper size limit to file size? Or the best way to compress? Or any other suggestions for dealing w/ the repeat sequences?
thanks, keith.
I'll throw out there what I have in mind, I'd like to upload the export.txt file to galaxy, and then group and count the most common sequence tags in the export file, then blat/blast search the most common tags to see what they are (ie, satellite, line, sine, etc.) My other problem is that I am unable to upload the export.txt file to galaxy. I assume it must be compressed, does anyone know anything about an upper size limit to file size? Or the best way to compress? Or any other suggestions for dealing w/ the repeat sequences?
thanks, keith.
Comment