I am new to the field but from what I gathered is that it is essential to filter sequences with low quality (phred scores).
But what I generally observed is that I lose ~70-80% reads from some publicly available data when I apply a filter to remove reads with mean quality score < 20 (phred 33/illumina 1.8+) and individual nucleotide score < 10.
I am not sure whether it is a very stringent criterion.
When I asked one of the data submitters regarding the filtering criteria that they used he said that they didn't use any filters.
Can someone please tell me what is the right cut-off, which would both minimize data loss and preserve reliability.
But what I generally observed is that I lose ~70-80% reads from some publicly available data when I apply a filter to remove reads with mean quality score < 20 (phred 33/illumina 1.8+) and individual nucleotide score < 10.
I am not sure whether it is a very stringent criterion.
When I asked one of the data submitters regarding the filtering criteria that they used he said that they didn't use any filters.
Can someone please tell me what is the right cut-off, which would both minimize data loss and preserve reliability.
Comment