Hi all,
We are analyzing a set of RNA-seq libraries and noticed that our libraries have very different levels of PCR duplicates, resulting in very different levels of complexity: some libraries have "many shades" of transcription, while others have lots of genes with low or zero expression or with very high levels, but relatively few intermediates. This is likely due to a combination of technical issues with the library construction and the fact that some libraries had much lower RNA amounts.
We cannot re-do the experiment or the library sequencing since we are dealing with material that is hard to get by, so we are looking for a statistical or computational method to deal with the PCR duplicates and different levels of library complexity. Any ideas or recommendations for papers published on this subject? I assume we are not the first to have encountered this issue...
Thanks
Daniel
We are analyzing a set of RNA-seq libraries and noticed that our libraries have very different levels of PCR duplicates, resulting in very different levels of complexity: some libraries have "many shades" of transcription, while others have lots of genes with low or zero expression or with very high levels, but relatively few intermediates. This is likely due to a combination of technical issues with the library construction and the fact that some libraries had much lower RNA amounts.
We cannot re-do the experiment or the library sequencing since we are dealing with material that is hard to get by, so we are looking for a statistical or computational method to deal with the PCR duplicates and different levels of library complexity. Any ideas or recommendations for papers published on this subject? I assume we are not the first to have encountered this issue...
Thanks
Daniel