Hi,
I just learned that most genome assemblers rely on the order of sequences within paired read files to maintain read pairings. Unfortunately, I have already gone and processed each of my paired-end read files separately using fastx-toolkit and now sequences within each file are not paired up correctly. I've searched for ways to compare the two files and extract matching sequences into two new files and put all the orphaned reads into a third file, but so far I haven't found anything that has worked. Does anyone have any simple workarounds for this (eg. perl scripts) other than starting over using other filtering software (eg. Trimmomatic)? I'm working with RADseq data (fastq = phred33).
Anyhelp would be great!
Thanks,
Bryan
I just learned that most genome assemblers rely on the order of sequences within paired read files to maintain read pairings. Unfortunately, I have already gone and processed each of my paired-end read files separately using fastx-toolkit and now sequences within each file are not paired up correctly. I've searched for ways to compare the two files and extract matching sequences into two new files and put all the orphaned reads into a third file, but so far I haven't found anything that has worked. Does anyone have any simple workarounds for this (eg. perl scripts) other than starting over using other filtering software (eg. Trimmomatic)? I'm working with RADseq data (fastq = phred33).
Anyhelp would be great!
Thanks,
Bryan
Comment