I want to simulate shorter reads from a particular dataset. Say I want 50bp paired end reads from a 100bp paired end read data set while keeping the same insert size. Would I take the first 50 characters of the sequence and the score strings from each end? Or would I take the first 50 from the first end and the last 50 from the last end.
Extracting the characters is easy with a simple awk command. I'm just curious about the order.
Thank you very much.
Extracting the characters is easy with a simple awk command. I'm just curious about the order.
Thank you very much.
Comment