Greetings everyone,
I am recently looking into some aligners for Illumina mate-pair data (insert size about 2000 bp). After the alignment, I want to detect structural variants with breakdancer or gasv. I experienced some behaviour I don't quite understand. I would appreciate very much any help and tips concerning good alignment tools.
That's what I've done and experienced so far:
(1) I really like bwa, but I now read mulitple times that bwa does not work with mate-pairs (RF orientation). What exactly does that mean? Should it give an error message, or does it give wrong results? I tried bwa (version 0.5.7) with a few Illumina mate-pair datasets and if I hadn't read here that it should not work, I would not have noticed anything. The results seemed pretty ok and comprehensible to me. It's kind of strange and contradictory. I'm a little confused.
I read, that it should work, if one computes the reverse complement of both ends to "simulate" paired-end data in FR-orientation. Somehow, I don't really like to do that, if there are other ways. In case of mate-pair data, you have to expect that it is contamined with some paired-end reads (maybe 10% of all pairs). Hence, you might not get around having reads in RF-orientation. I still don't really understand, what can be problematic about the orientation, when mapping paired-end or mate-pair data.
(2) In my opinion, MAQ performs well but takes way too much time for this high amount of data (about 8 lanes with > 200.000.000 pairs). Even if I run MAQ for every lane seperately in parallel.
(3) I never worked with Mosaik yet, but I'd also fear, that MosaikSort will not work, because I read, that it fails when the mate-pair data is contamined with too many paired-end reads. Is it necessary to run MosaikSort, or can I just use MosaikAligner and then move on to other tools?
(4) I also have experience with Bowtie, but it seems to me, that it discards any reads beyond the given insert size ranges (-I and -X parameters). They appear as unmapped in the SAM-output-file. I want to run structural variation tools like breakdancer and gasv on the alignment, and the behaviour of bowtie to report only valid pairs is not very helpful. I could not find a way (or parameter) to get around that. Did I miss something in the manual?
Thanks in advance to everyone who likes to comment on that, to correct me or who can give some advice.
I am recently looking into some aligners for Illumina mate-pair data (insert size about 2000 bp). After the alignment, I want to detect structural variants with breakdancer or gasv. I experienced some behaviour I don't quite understand. I would appreciate very much any help and tips concerning good alignment tools.
That's what I've done and experienced so far:
(1) I really like bwa, but I now read mulitple times that bwa does not work with mate-pairs (RF orientation). What exactly does that mean? Should it give an error message, or does it give wrong results? I tried bwa (version 0.5.7) with a few Illumina mate-pair datasets and if I hadn't read here that it should not work, I would not have noticed anything. The results seemed pretty ok and comprehensible to me. It's kind of strange and contradictory. I'm a little confused.
I read, that it should work, if one computes the reverse complement of both ends to "simulate" paired-end data in FR-orientation. Somehow, I don't really like to do that, if there are other ways. In case of mate-pair data, you have to expect that it is contamined with some paired-end reads (maybe 10% of all pairs). Hence, you might not get around having reads in RF-orientation. I still don't really understand, what can be problematic about the orientation, when mapping paired-end or mate-pair data.
(2) In my opinion, MAQ performs well but takes way too much time for this high amount of data (about 8 lanes with > 200.000.000 pairs). Even if I run MAQ for every lane seperately in parallel.
(3) I never worked with Mosaik yet, but I'd also fear, that MosaikSort will not work, because I read, that it fails when the mate-pair data is contamined with too many paired-end reads. Is it necessary to run MosaikSort, or can I just use MosaikAligner and then move on to other tools?
(4) I also have experience with Bowtie, but it seems to me, that it discards any reads beyond the given insert size ranges (-I and -X parameters). They appear as unmapped in the SAM-output-file. I want to run structural variation tools like breakdancer and gasv on the alignment, and the behaviour of bowtie to report only valid pairs is not very helpful. I could not find a way (or parameter) to get around that. Did I miss something in the manual?
Thanks in advance to everyone who likes to comment on that, to correct me or who can give some advice.
Comment