Hi everybody
I started analyzing my first ChIP-seq data set, it contains one ChIP-sample and one input sample. After mapping the reads to a reference genome by using Bowtie and additionally MAQs, around 70% of all reads were uniquely mapped to the reference, which should be a quiet good rate (I guess).
For the input sample ~20 mio reads were left, and for the ChIP-sample ~17 mio reads.
However, I found ~2 mio duplicated reads (matching the same chromosomal location) in the input sample and ~16 mio duplicated reads in the ChIP-sample, which might be due to amplification errors or library preparation.
In literature I read that the expected number of reads matching the same position and strand can be modeled by the Poisson distribution. Is this assumption also true for ChIP-samples, where we enrich specific chromosomal locations and get rid of those, where the TF doesn't bind? Wouldn't we expect to find more more duplicated reads in ChIP-samples than in the Input samples
To identify Peaks I used MACS that removes duplicated reads before calling the peaks. Does anyone know a good peak caller that parameters concerning duplicated reads can be adjusted by the user? I want to try to set a customized threshold for the number of duplicated reads depending on my duplicated read distribution and check the sequences of my peak regions. Fortunately the TFBS motif of my TF is already known, so I can verify my results.
It would be great to get some comments or ideas, as I am an absolute beginner in NGS analyses...
Besides that, thank you for the great forum, it's a great help
Thanks a lot in advance
Kathrin
I started analyzing my first ChIP-seq data set, it contains one ChIP-sample and one input sample. After mapping the reads to a reference genome by using Bowtie and additionally MAQs, around 70% of all reads were uniquely mapped to the reference, which should be a quiet good rate (I guess).
For the input sample ~20 mio reads were left, and for the ChIP-sample ~17 mio reads.
However, I found ~2 mio duplicated reads (matching the same chromosomal location) in the input sample and ~16 mio duplicated reads in the ChIP-sample, which might be due to amplification errors or library preparation.
In literature I read that the expected number of reads matching the same position and strand can be modeled by the Poisson distribution. Is this assumption also true for ChIP-samples, where we enrich specific chromosomal locations and get rid of those, where the TF doesn't bind? Wouldn't we expect to find more more duplicated reads in ChIP-samples than in the Input samples
To identify Peaks I used MACS that removes duplicated reads before calling the peaks. Does anyone know a good peak caller that parameters concerning duplicated reads can be adjusted by the user? I want to try to set a customized threshold for the number of duplicated reads depending on my duplicated read distribution and check the sequences of my peak regions. Fortunately the TFBS motif of my TF is already known, so I can verify my results.
It would be great to get some comments or ideas, as I am an absolute beginner in NGS analyses...
Besides that, thank you for the great forum, it's a great help
Thanks a lot in advance
Kathrin
Comment