Hi,
I know there were already few discussions about this topic but I am not sure I got it.
I have a fastq file containing Illumina paired-end reads. Below are the first four headers of the fastq file.
@HWUSI-EAS1599:82:64H78AAXX:7:100:10000:10533 1:N:0:CGATGT
@HWUSI-EAS1599:82:64H78AAXX:7:100:10000:10533 2:N:0:CGATGT
@HWUSI-EAS1599:82:64H78AAXX:7:100:10000:10642 1:N:0:CGATGT
@HWUSI-EAS1599:82:64H78AAXX:7:100:10000:10642 2:N:0:CGATGT
It looks like first and second lines are a pair and the third and fourth lines are the another pair. If I want to make a subset containing 1,000 reads, can I just extract the first 1,000 reads in order using 'head' command? I do not understand why it might cause biases. If 'head' command is not a good way for subsampling, any very simple way to do it? Thank you a lot for your comments in advance.
I know there were already few discussions about this topic but I am not sure I got it.
I have a fastq file containing Illumina paired-end reads. Below are the first four headers of the fastq file.
@HWUSI-EAS1599:82:64H78AAXX:7:100:10000:10533 1:N:0:CGATGT
@HWUSI-EAS1599:82:64H78AAXX:7:100:10000:10533 2:N:0:CGATGT
@HWUSI-EAS1599:82:64H78AAXX:7:100:10000:10642 1:N:0:CGATGT
@HWUSI-EAS1599:82:64H78AAXX:7:100:10000:10642 2:N:0:CGATGT
It looks like first and second lines are a pair and the third and fourth lines are the another pair. If I want to make a subset containing 1,000 reads, can I just extract the first 1,000 reads in order using 'head' command? I do not understand why it might cause biases. If 'head' command is not a good way for subsampling, any very simple way to do it? Thank you a lot for your comments in advance.
Comment