Hi,
I've been trying to join one single end and one paired end data set (both Illumina HiSeq) in an RNA seq experiment. The PE data has been trimmed to the same length as the SE and I used only R1. 12 libraries were sequenced by both SE and PE. The read depth of the PE data was substantially lower than the SE.
MDS plotting of TMM normalized cpms of the replicates shows a batch effect between SE and PE. The pearson correlations of normalized cpms are also quite poor, ranging from 0.95 to 0.99.
So, is it even possible to join SE and PE data for RNA seq? Or can the difference I'm seeing be due difference in sequence chemistry?
The ComBat function in the sva package removes the batch effect and the replicates cluster perfectly afterwards. However, I've seen threads saying that batch removed data should only be used for clustering purposes and is not meant to be continued working with.
The RNA data is meant for pattern recognition, not DEG analysis...
I've been trying to join one single end and one paired end data set (both Illumina HiSeq) in an RNA seq experiment. The PE data has been trimmed to the same length as the SE and I used only R1. 12 libraries were sequenced by both SE and PE. The read depth of the PE data was substantially lower than the SE.
MDS plotting of TMM normalized cpms of the replicates shows a batch effect between SE and PE. The pearson correlations of normalized cpms are also quite poor, ranging from 0.95 to 0.99.
So, is it even possible to join SE and PE data for RNA seq? Or can the difference I'm seeing be due difference in sequence chemistry?
The ComBat function in the sva package removes the batch effect and the replicates cluster perfectly afterwards. However, I've seen threads saying that batch removed data should only be used for clustering purposes and is not meant to be continued working with.
The RNA data is meant for pattern recognition, not DEG analysis...
Comment