First, I'd like to say Ion Torrent really sucks!
We used to use Illumina data generated by a third party for our research. Last year, my boss decided to buy our own sequencing machine. Although everyone suggested him to use Illumina, due to cost consideration, we switched to Ion Torrent platform...That's where the nightmare began.
We have a pipeline on our cluster for Illumina data which works fine. It reads the vcf files and then output a filtered annotated variants lists. The filter options are really simple, just based on MAF and inheritance type. Since we switched to Ion Torrent platform, we need a pipeline like this as well.
Basically, the vcf files are the same. The pipeline should also work for Ion Torrent vcf files. GATK can call multiple samples together. So you can get genotype of every sample for a variant. Ion Torrent can only call variant for one sample. So if the sample doesn't have this variant, the vcf file won't have it. But you don't really know if the genotype at that position is not observed or same as the reference without looking into the bam files. I asked the Ion Torrent support people, they said if the variant is listed in the vcf, you can consider it same as reference.
Fine. Let's go with this. Then, we lots of cases as the following one:
In a proband, there is a variant like this:
chr2 202006095 . AG A,AA 1/1
If you look for this variant in the unaffected samples, they don't have it. OK, this looks promising, a deletion that proband has but none of the unaffected samples have it.
Then I notice in the vcf files of all the unaffected samples, they have the following variant:
chr2 202006096 . G A 0/1.
Ok, now I believe the proband should have the same variant G->A as other samples at position chr2 202006096. But Ion Torrent identified it as an INDEL. We found lots of cases like this. I believe, in the vcf files of Illumina data, it will at least list this 2 variants, 1 as indel and 1 as a SNP. With genotype of other samples. You can easily recoginize it. But for Ion Torrent, if we don't look into it very carefully, we won't notice this.
I believe everyone here are using Ion Torrent. How do you overcome this problem? Is there a way to overcome this problem without checking the variant manually?
We used to use Illumina data generated by a third party for our research. Last year, my boss decided to buy our own sequencing machine. Although everyone suggested him to use Illumina, due to cost consideration, we switched to Ion Torrent platform...That's where the nightmare began.
We have a pipeline on our cluster for Illumina data which works fine. It reads the vcf files and then output a filtered annotated variants lists. The filter options are really simple, just based on MAF and inheritance type. Since we switched to Ion Torrent platform, we need a pipeline like this as well.
Basically, the vcf files are the same. The pipeline should also work for Ion Torrent vcf files. GATK can call multiple samples together. So you can get genotype of every sample for a variant. Ion Torrent can only call variant for one sample. So if the sample doesn't have this variant, the vcf file won't have it. But you don't really know if the genotype at that position is not observed or same as the reference without looking into the bam files. I asked the Ion Torrent support people, they said if the variant is listed in the vcf, you can consider it same as reference.
Fine. Let's go with this. Then, we lots of cases as the following one:
In a proband, there is a variant like this:
chr2 202006095 . AG A,AA 1/1
If you look for this variant in the unaffected samples, they don't have it. OK, this looks promising, a deletion that proband has but none of the unaffected samples have it.
Then I notice in the vcf files of all the unaffected samples, they have the following variant:
chr2 202006096 . G A 0/1.
Ok, now I believe the proband should have the same variant G->A as other samples at position chr2 202006096. But Ion Torrent identified it as an INDEL. We found lots of cases like this. I believe, in the vcf files of Illumina data, it will at least list this 2 variants, 1 as indel and 1 as a SNP. With genotype of other samples. You can easily recoginize it. But for Ion Torrent, if we don't look into it very carefully, we won't notice this.
I believe everyone here are using Ion Torrent. How do you overcome this problem? Is there a way to overcome this problem without checking the variant manually?
Comment