Hi,
I have something of a newbie question regarding filtering RNA-seq data. I've downloaded some RNA-seq data from SRA in fastq format and plan to align this to the genome using tophat.
Is it necessary/standard to filter this prior to generating the mappings, I guess the mapping itself is acting as a kind of filter as poor quality data presumably wont be aligned is this sufficient. It would be easy to filter out sequences with stretches of Ns is that typically done. What about using the quality info in the fastq file should I be using this for some filtering prior to mapping?
I assume the data deposited in SRA generally represent some form of quality filtered set and not totally raw data is that correct?
Any pointers or links to info greatly appreciated.
dav
I have something of a newbie question regarding filtering RNA-seq data. I've downloaded some RNA-seq data from SRA in fastq format and plan to align this to the genome using tophat.
Is it necessary/standard to filter this prior to generating the mappings, I guess the mapping itself is acting as a kind of filter as poor quality data presumably wont be aligned is this sufficient. It would be easy to filter out sequences with stretches of Ns is that typically done. What about using the quality info in the fastq file should I be using this for some filtering prior to mapping?
I assume the data deposited in SRA generally represent some form of quality filtered set and not totally raw data is that correct?
Any pointers or links to info greatly appreciated.
dav
Comment