Hi all
I have a RNA-seq dataset that I am working on.
I have 2 biological replicates for 6 samples.
case a) When I used picard tools deduping, I see a 99% correlation between 2 replicates.
case b) When I am not using deduping I also see a 99% correlation between 2 replicates. In the latter case I see an outlier that has very high FPKM in both the replicates and that is driving the R-square to 0.99 and without that outlier gene I see a R-square of 0.40.
I am trying to understand if this difference 0.99 from case a) to 0.40 in case b) is solely due to duplicated reads. If that's the case is it wise to remove these duplicated reads using picard mark duplicates.
I am using Cufflinks v2.1.1 for this analysis.
Please comment.
I have a RNA-seq dataset that I am working on.
I have 2 biological replicates for 6 samples.
case a) When I used picard tools deduping, I see a 99% correlation between 2 replicates.
case b) When I am not using deduping I also see a 99% correlation between 2 replicates. In the latter case I see an outlier that has very high FPKM in both the replicates and that is driving the R-square to 0.99 and without that outlier gene I see a R-square of 0.40.
I am trying to understand if this difference 0.99 from case a) to 0.40 in case b) is solely due to duplicated reads. If that's the case is it wise to remove these duplicated reads using picard mark duplicates.
I am using Cufflinks v2.1.1 for this analysis.
Please comment.
Comment