Hi folks,
I am new to SEQanswer. A brief introduction about myself: I have been a computational person (PhD student studying Genetic Epidemiology). I have used SAS and R to handle the GWAS data a lot, but recently, I just changed my thesis project and started working on NGS data. It is a lot of fun, however, coming with challenges…
I am actually characterizing DNA methylomes using MeDIP-seq & MRE-seq technologies. The problem is, I have been working on my data for more than half a year, but I still don’t know my NGS data very well. For example, in the processed MRE-seq data (.CpG.bedGraph file), there are some CpG sites that have super large values. The median value of CpG density is 0.9137, but there are 30ish CpG sites having a CpG density value greater than 100, and a dozen of CpG sites having a CpG density value of several thousands. Below is the quantiles of the values:
5% 25% 50% 75% 99% 100%
0.0722 0.1444 0.9137 3.1771 21.0161 1857.6816
I wonder if these huge values really mean something, or if they are just outliers, or if they are simply due to PCR artifact, etc? Should I decide a threshold to exclude the CpG sites with super large values or leave them there?
I also wonder if anyone can give me some recommendations on references/websites/books for people newly switching to NGS field to read? Is there some basic idea that I need to know by heart when processing/analyzing NGS data?
Thank you very much,
DNAmethylome
I am new to SEQanswer. A brief introduction about myself: I have been a computational person (PhD student studying Genetic Epidemiology). I have used SAS and R to handle the GWAS data a lot, but recently, I just changed my thesis project and started working on NGS data. It is a lot of fun, however, coming with challenges…
I am actually characterizing DNA methylomes using MeDIP-seq & MRE-seq technologies. The problem is, I have been working on my data for more than half a year, but I still don’t know my NGS data very well. For example, in the processed MRE-seq data (.CpG.bedGraph file), there are some CpG sites that have super large values. The median value of CpG density is 0.9137, but there are 30ish CpG sites having a CpG density value greater than 100, and a dozen of CpG sites having a CpG density value of several thousands. Below is the quantiles of the values:
5% 25% 50% 75% 99% 100%
0.0722 0.1444 0.9137 3.1771 21.0161 1857.6816
I wonder if these huge values really mean something, or if they are just outliers, or if they are simply due to PCR artifact, etc? Should I decide a threshold to exclude the CpG sites with super large values or leave them there?
I also wonder if anyone can give me some recommendations on references/websites/books for people newly switching to NGS field to read? Is there some basic idea that I need to know by heart when processing/analyzing NGS data?
Thank you very much,
DNAmethylome