Hi there,
i am a somewhat beginner in bioinformatics, so please apologize if i am asking any silly questions.
I do have several dozens of files containing millions of processed illumina reads. The sequences have already been converted to fasta format.
Now i want to bin all my files into a single file, in order to have a uniform OTU nomenclature (i want to feed my otu picker with a single file).
I will use a normal cat command for merging all my files.
However, i cant think of commands, that
a) add an additional character (the number of sample) to any fasta header in the file
b) later on, after OTU picking, sort all sequences containing that same identifier into a new file.
I cant use the barcode information in the header, because several barcodes have been used multiple times.
Any idea? Thank you very much!
i am a somewhat beginner in bioinformatics, so please apologize if i am asking any silly questions.
I do have several dozens of files containing millions of processed illumina reads. The sequences have already been converted to fasta format.
Now i want to bin all my files into a single file, in order to have a uniform OTU nomenclature (i want to feed my otu picker with a single file).
I will use a normal cat command for merging all my files.
However, i cant think of commands, that
a) add an additional character (the number of sample) to any fasta header in the file
b) later on, after OTU picking, sort all sequences containing that same identifier into a new file.
I cant use the barcode information in the header, because several barcodes have been used multiple times.
Any idea? Thank you very much!
Comment