We've just had a set of 3 samples sequenced using an Illumina GA with 76bp single-end reads, with one sample per lane. Each lane contains around 15 million reads, with chastity enforced.
I aligned the samples with bowtie using the "--solexa1.3-quals" option. Very few of the reads align against the source organism for the samples (mouse): 13%, 3% and 3% respectively. I also tried MAQ, which was no better. Our conclusion is that there is something wrong with the samples; now we just need to identify that problem. In particular, the mysterious source of the large number of remaining high quality reads that are not from mouse.
One possibility is that the samples are contaminated with another genome. Is it possible to check a read against a variety of genomes to see where it came from? A quick Google search reveals that GenomeMapper has this feature: is this a good approach?
Can anybody suggest any other techniques for tackling this problem?
Thanks,
Peter
I aligned the samples with bowtie using the "--solexa1.3-quals" option. Very few of the reads align against the source organism for the samples (mouse): 13%, 3% and 3% respectively. I also tried MAQ, which was no better. Our conclusion is that there is something wrong with the samples; now we just need to identify that problem. In particular, the mysterious source of the large number of remaining high quality reads that are not from mouse.
One possibility is that the samples are contaminated with another genome. Is it possible to check a read against a variety of genomes to see where it came from? A quick Google search reveals that GenomeMapper has this feature: is this a good approach?
Can anybody suggest any other techniques for tackling this problem?
Thanks,
Peter
Comment