I am looking at some bacterial Illumina RNA-seq data and seeing very high levels of sequence duplication. Inspecting the alignments on IGV revealed alignments that look odd to me and confirms the high levels of duplication. I have two questions:
1) Has anyone seen such data before and do you know what are the factors which might have caused it to be this way? (e.g. library prep stage?)
2) Is it ok to proceed with using count data from samples like this in a differential expression pipeline?
Here is an IGV screenshot, with the lower sample being a 'strange' one:
1) Has anyone seen such data before and do you know what are the factors which might have caused it to be this way? (e.g. library prep stage?)
2) Is it ok to proceed with using count data from samples like this in a differential expression pipeline?
Here is an IGV screenshot, with the lower sample being a 'strange' one: