Hi,
In order to demonstrate the need for pre-procesing of illumina RNASeq reads to a biologist class, I would like to identify a good example of dirty data ;-) to which I could apply quality filtering, adaptor clipping and other pre-procesing cleanup steps and QC, and compare the result of tophat2-mapping of either data to a small-enough reference transcriptome/genome. Synthetic data is not really an option here, I would prefer true dirty data.
I have trouble finding 'bad' data out there and am sure some of you has a nice dirty dataset to use for this.
Thanks for sharing any reference to a public dataset I could use for training.
Cheers,
In order to demonstrate the need for pre-procesing of illumina RNASeq reads to a biologist class, I would like to identify a good example of dirty data ;-) to which I could apply quality filtering, adaptor clipping and other pre-procesing cleanup steps and QC, and compare the result of tophat2-mapping of either data to a small-enough reference transcriptome/genome. Synthetic data is not really an option here, I would prefer true dirty data.
I have trouble finding 'bad' data out there and am sure some of you has a nice dirty dataset to use for this.
Thanks for sharing any reference to a public dataset I could use for training.
Cheers,