Hello Community!
Im finishing my PhD on epigenetics and histone PTMs in Germany and a colleague here is having a rather big problem. Replicate 2 from a transcription factor ChIP-seq (PE 150, Nova 6000) was contaminated with what we can only assume to be cDNA (Exons only, very very high number of reads). It s in the inputs and the pulldowns. The colleague sent a new batch (replicate 3) but the outcome is very very poor quality. Now replicate 2 is much better but unusable without removing the cDNA reads.
Does anyone know of a program/algorithm that can identify reads by polyT tails and remove them from the fastq/BAM file? Im rather stuck and the only solution seems to be writing a new script to do this. I find it an interesting problem from a bioinformatics point of view, I suppose it is similar in concept to separating multiplexed reads based on adaptor sequences.
Thanks for any ideas!
Cheers,
Michel
Im finishing my PhD on epigenetics and histone PTMs in Germany and a colleague here is having a rather big problem. Replicate 2 from a transcription factor ChIP-seq (PE 150, Nova 6000) was contaminated with what we can only assume to be cDNA (Exons only, very very high number of reads). It s in the inputs and the pulldowns. The colleague sent a new batch (replicate 3) but the outcome is very very poor quality. Now replicate 2 is much better but unusable without removing the cDNA reads.
Does anyone know of a program/algorithm that can identify reads by polyT tails and remove them from the fastq/BAM file? Im rather stuck and the only solution seems to be writing a new script to do this. I find it an interesting problem from a bioinformatics point of view, I suppose it is similar in concept to separating multiplexed reads based on adaptor sequences.
Thanks for any ideas!
Cheers,
Michel
Comment