Hi all,
I am preprocessing a dataset from a human sample sequenced by Illumina HiSeq 2500 (Paired-end reads, 100bp each). I first trim each read based on quality. If the trimmed sequence is too short, I just discard it.
My question is how do you pick the threshold length to discard? Would you discard reads shorter than 50, 40, or 30? What is the right approach to pick a threshold?
I haven't been able to find any information on this on the web. (By the way, I am using BWA for alignment.)
Thanks in advance.
I am preprocessing a dataset from a human sample sequenced by Illumina HiSeq 2500 (Paired-end reads, 100bp each). I first trim each read based on quality. If the trimmed sequence is too short, I just discard it.
My question is how do you pick the threshold length to discard? Would you discard reads shorter than 50, 40, or 30? What is the right approach to pick a threshold?
I haven't been able to find any information on this on the web. (By the way, I am using BWA for alignment.)
Thanks in advance.
Comment