Hi all,
I'm having some trouble with the analysis of my ChIP-seq data. From a ChIP-seq experiment of mouse pancreas, I get a reasonable number of reads that map to the mouse genome (good!), some map to the human genome (contamination), and around 35% don't map anywhere.
To start with, the Fastqc analysis doesn't reveal overrepresented sequences (I thougt that adaptors might be contaminating but it doesn't seem to be the case)
I've checked wether the orphan reads match to different microoranism genomes, no hits. I've also checked a database that contains adaptor sequences, no hits.
When I blast a read against mouse/human, I get a perfect match for half of the sequence, but no matches for the rest of the read.
If I blast the non-matching sequence against everything, I get a list of matches against different microorganisms. But they are always the same, so my guess is that these are conserved regions, not specific for a single microorg.
I would appreciate any advice of what can I do to know what are these reads.
Many thanks,
Francesc
Ps. I'm attaching some of the commented orphan reads in case you wished to check anything:
CCACTGAAGGTGAATTTGTCTTTTACGAAGGTCCACCAAC
CGACCACGGGAGCATCGTTCGCGTCCAGCGCGAAACGGCG
CCAATTCCTTCCGCGCCTTGGCTGCGCTAATATCTCCCGT
CAATAATTCTTGGCAATGGTTCAATCGTACTGGTCGAGCT
TGATAAGAAATAATTGTAAGTAGCTAACAATATTCCAAGT
GCATTCTCTCGCCGCGACTGTCCTCGATAGACACCAACTC
GATGCTGGTCCACTCGCCGACGAGGATCTGATCGTGAGCG
GTGTTATTTATTTACTCACATCGATAACAGTGATAAACTC
CTCATCGACGGCGTGCGCGCGCTGCGGGCCCGGCAGATGG
GGTACTCTCTCAGCAAGGAGAGATGAAGGAGGAAGAAGTT
CCATCTTCATTTTCGATGAATGAGTATGCTTGGATTTCAA
CTTTGCAAGGCGTCTGCCAATTGTTGGTTCGCCTCTTCGA
CCAGGATTGAAAAGTTTGTCAAAAAGGCGGTTATTCAGGA
ATTATTTAGTGGTTTTAACTAACGATTTCGTCTAGAAATG
ATCTATATCGTCTTCACGCAGAAGGTGACCGATTGGCGCA
CGCCGCTTCTATCGAAAGGAGCTCTAAGATGGTCAAATTG
AGAAAAATGAAATGCGTTGCGTGGCTAAAAGCATATAACG
I'm having some trouble with the analysis of my ChIP-seq data. From a ChIP-seq experiment of mouse pancreas, I get a reasonable number of reads that map to the mouse genome (good!), some map to the human genome (contamination), and around 35% don't map anywhere.
To start with, the Fastqc analysis doesn't reveal overrepresented sequences (I thougt that adaptors might be contaminating but it doesn't seem to be the case)
I've checked wether the orphan reads match to different microoranism genomes, no hits. I've also checked a database that contains adaptor sequences, no hits.
When I blast a read against mouse/human, I get a perfect match for half of the sequence, but no matches for the rest of the read.
If I blast the non-matching sequence against everything, I get a list of matches against different microorganisms. But they are always the same, so my guess is that these are conserved regions, not specific for a single microorg.
I would appreciate any advice of what can I do to know what are these reads.
Many thanks,
Francesc
Ps. I'm attaching some of the commented orphan reads in case you wished to check anything:
CCACTGAAGGTGAATTTGTCTTTTACGAAGGTCCACCAAC
CGACCACGGGAGCATCGTTCGCGTCCAGCGCGAAACGGCG
CCAATTCCTTCCGCGCCTTGGCTGCGCTAATATCTCCCGT
CAATAATTCTTGGCAATGGTTCAATCGTACTGGTCGAGCT
TGATAAGAAATAATTGTAAGTAGCTAACAATATTCCAAGT
GCATTCTCTCGCCGCGACTGTCCTCGATAGACACCAACTC
GATGCTGGTCCACTCGCCGACGAGGATCTGATCGTGAGCG
GTGTTATTTATTTACTCACATCGATAACAGTGATAAACTC
CTCATCGACGGCGTGCGCGCGCTGCGGGCCCGGCAGATGG
GGTACTCTCTCAGCAAGGAGAGATGAAGGAGGAAGAAGTT
CCATCTTCATTTTCGATGAATGAGTATGCTTGGATTTCAA
CTTTGCAAGGCGTCTGCCAATTGTTGGTTCGCCTCTTCGA
CCAGGATTGAAAAGTTTGTCAAAAAGGCGGTTATTCAGGA
ATTATTTAGTGGTTTTAACTAACGATTTCGTCTAGAAATG
ATCTATATCGTCTTCACGCAGAAGGTGACCGATTGGCGCA
CGCCGCTTCTATCGAAAGGAGCTCTAAGATGGTCAAATTG
AGAAAAATGAAATGCGTTGCGTGGCTAAAAGCATATAACG
Comment