Hi everyone,
I've got a couple questions about dealing with quality info from Illumina data. What's the general opinion on the following things I've heard:
1) After a "B" indicating a bad call, all downstream nucleotides can't be relied upon and should be thrown out. (http://brianknaus.com/software/srtoolbox/shortread.html)
2) The Illumina pipeline filtering throws out lots of useful data. You should work directly from the _qseq and not the filtered _sequence.txt files.
Thoughts?? Opinions?? Is quality filtering like what's described in 1) overkill if I'm using MAQ as a quality aware aligner downstream?
I'm eventually aligning (MAQ) all these to a reference genome for ChIP-seq on microbial samples. Our coverage is super deep so I haven't just been throwing away repeat reads, since those could be expected based on our depth of sequence coverage. I'm looking for good ways to clean up our data!
Thanks!!!!
Lizzy
I've got a couple questions about dealing with quality info from Illumina data. What's the general opinion on the following things I've heard:
1) After a "B" indicating a bad call, all downstream nucleotides can't be relied upon and should be thrown out. (http://brianknaus.com/software/srtoolbox/shortread.html)
2) The Illumina pipeline filtering throws out lots of useful data. You should work directly from the _qseq and not the filtered _sequence.txt files.
Thoughts?? Opinions?? Is quality filtering like what's described in 1) overkill if I'm using MAQ as a quality aware aligner downstream?
I'm eventually aligning (MAQ) all these to a reference genome for ChIP-seq on microbial samples. Our coverage is super deep so I haven't just been throwing away repeat reads, since those could be expected based on our depth of sequence coverage. I'm looking for good ways to clean up our data!
Thanks!!!!
Lizzy
Comment