Seqanswers Leaderboard Ad

**jhalpin** · 02-01-2017, 11:41 AM

I don't know the solution, but I can tell you Ion Torrent has a lot of deletion errors that are sequencing artifacts when sequencing through a long (>5bp) homopolymer region. If this is where your variants fall, you're going to have a bad time.

**mlariviere** · 02-05-2017, 06:39 AM

Hi Jiayi Zhou,

Using AmpliSeq Exome on Proton must not be that bad. It allowed one of researchers at your institution to find mutation causing hearing loss in Newfoundland population:

403

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5215284/

Have you contacted your local Ion Torrent support regarding your issue?

**Brian Bushnell** · 02-05-2017, 09:30 AM

Hi zhoujiayi,

I don't really like the practice of combining indels and SNPs (or, generally, different variants) on the same line in VCF, as it makes the format even more confusing and difficult to read. So, BBMap's CallVariants tool puts each variant on a unique line, which might be what you're looking for. You use it like this:

callvariants.sh in=sample1.sam,sample2.sam ref=ref.fa out=vars.vcf multisample ploidy=2 prefilter

I'd be interested in hearing how well it performs on Ion Torrent data; I've only been testing it on Illumina so far.

**zhoujiayi** · 02-06-2017, 06:37 AM

Originally posted by mlariviere View Post

Hi Jiayi Zhou,

Using AmpliSeq Exome on Proton must not be that bad. It allowed one of researchers at your institution to find mutation causing hearing loss in Newfoundland population:

403

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5215284/

Have you contacted your local Ion Torrent support regarding your issue?

Yes. I am one of the author of this paper. The performance of detecting SNPs seems good for Ion Torrrent. But we are having troubles with INDELs. We are now working on improving the performance of detecting INDELs.

**zhoujiayi** · 02-06-2017, 06:45 AM

Originally posted by Brian Bushnell View Post

Hi zhoujiayi,

I don't really like the practice of combining indels and SNPs (or, generally, different variants) on the same line in VCF, as it makes the format even more confusing and difficult to read. So, BBMap's CallVariants tool puts each variant on a unique line, which might be what you're looking for. You use it like this:

callvariants.sh in=sample1.sam,sample2.sam ref=ref.fa out=vars.vcf multisample ploidy=2 prefilter

I'd be interested in hearing how well it performs on Ion Torrent data; I've only been testing it on Illumina so far.

Thank you for your suggestion. But this software cannot help for my case. Because Ion Torrent detected the variants as a INDEL chr2 202006095 . AG A,AA 1/1, even I use the software to split it into 2 line. I got chr2 202006095 . AG A and chr2 202006095 . AG AA. Then, if I use software to check if the parents have these 2 variants, the parents won't have them. But in fact, chr2 202006095 . AG AA should be chr2 202006096 G A. The parents have this variant. So it is not the variant we are looking for. For sure, we can check this kind of variants manually. But we have lots of variants like this. It is impossible to filter them manually.

**Brian Bushnell** · 02-06-2017, 10:59 AM

Actually, in this case, BBMap would produce two lines that look like this:

Code:

chr2 202006095 . AG A
chr2 202006096 . G A

**zhoujiayi** · 02-07-2017, 06:10 AM

Originally posted by Brian Bushnell View Post

Actually, in this case, BBMap would produce two lines that look like this:

Code:

chr2 202006095 . AG A
chr2 202006096 . G A

Thank you so much. I'll try it.

**r.rosati** · 02-07-2017, 08:26 AM

Premise: I don't hold a grudge towards semiconductor sequencing. It's pretty good for some applications. It's great for germline data on gene panels. It's ok for investigating by exome analysis a case where there are a few candidate genes and you're not going to order a tailored panel for that specific case.
But it's a nightmare when you want to perform a broader analysis.

I agree with you that Ion Torrent default algorithms are very far from perfect. I've been in your situation too, and it's so unnerving.

Originally posted by zhoujiayi View Post

Ion Torrent can only call variant for one sample. So if the sample doesn't have this variant, the vcf file won't have it. But you don't really know if the genotype at that position is not observed or same as the reference without looking into the bam files. I asked the Ion Torrent support people, they said if the variant is listed in the vcf, you can consider it same as reference.

Here you probably mean "if the variant is not listed in the vcf, then it's equal to the reference".
If this is what the support people told you, I wholeheartedly disagree with them. It's just plainly not true.
Just take a look at the VariantCaller parameters and you'll see a list of things that will result in a variant not being called, which are there in the first place to avoid the sequencer calling too many of the errors it makes.
For example with default parameters, a region with homozygous SNV with < 5 total reads is not called; for indels, you need 10+ reads, and at least 5 of them on either strand. This might work ok in optimal conditions, i.e. high coverage and high quality, end-to-end reads; but in ordinary conditions, where some amplicons have consistently low coverage, you do get some trouble from it.
One example is trying to compare a trio: you'll see a lot of de novo mutations in the child and when you look it up on the parents, you'll see that the child has 4 of 6 reads with the variant (was called), and one parent shows it in 2 of 4 total reads (not called); and that's where the "de novo" variant came from. If you're only interested on a small set of genes, this is hardly a problem because you can verify manually. But broaden up your analysis, and you have to sort 200 de novo variants. Yeah sure. Either they're not de novo, or they're so much "de novo" that they started existing today in the sequencer.

Another example of bad behavior that results in a variant not being listed consistently, which happened in some exomes I've ran, is the fact that the Torrent Suite sometimes aligns a deletion on the 5' side, and sometimes on the 3' side of a repeated element. This results in (a) two possible variants describing the same deletion on a series of patients, and (b) the frequency of the deletion being split between 2 positions, possibly resulting in the variant not being called at all.

What I did once, was to take all "de novo" variants in a trio, make them into a hotspot .bed file to be fed to VariantCaller, and then run the VariantCaller plugin again. Every entry of a hotspot must be listed in the final vcf; if it was not called, then the reason must be stated, so you'd spot the risky calls.
A warning though: the default parameters for hotspot regions in Variant Caller are more stringent, so if you keep the default parameters, you will end up with some variants that will not be called anymore even in the sample that had them in the first place.
Another thing I did in that case was to write a simple Python script to go check every "de novo" variant (from the VCF) on the parents' BED files, and see if they actually, honestly did not have it.
Also if you have trios, IonReporter has a specific analysis option for that, and it might be helpful. If it wasn't so, so slow.

Anyways. I feel your pain too. If BBmap helps, let us know.

EDIT: why wait. I'll try BBMap on that data of mine.

**Brian Bushnell** · 02-07-2017, 01:27 PM

Originally posted by r.rosati View Post

EDIT: why wait. I'll try BBMap on that data of mine.

Thanks, I'd like to see the results! Probably I need to do some calibration to get rid of the most obvious IonTorrent-specific error modes.

**zhoujiayi** · 02-08-2017, 07:31 AM

Originally posted by r.rosati View Post

Premise: I don't hold a grudge towards semiconductor sequencing. It's pretty good for some applications. It's great for germline data on gene panels. It's ok for investigating by exome analysis a case where there are a few candidate genes and you're not going to order a tailored panel for that specific case.
But it's a nightmare when you want to perform a broader analysis.

I agree with you that Ion Torrent default algorithms are very far from perfect. I've been in your situation too, and it's so unnerving.

Here you probably mean "if the variant is not listed in the vcf, then it's equal to the reference".
If this is what the support people told you, I wholeheartedly disagree with them. It's just plainly not true.
Just take a look at the VariantCaller parameters and you'll see a list of things that will result in a variant not being called, which are there in the first place to avoid the sequencer calling too many of the errors it makes.
For example with default parameters, a region with homozygous SNV with < 5 total reads is not called; for indels, you need 10+ reads, and at least 5 of them on either strand. This might work ok in optimal conditions, i.e. high coverage and high quality, end-to-end reads; but in ordinary conditions, where some amplicons have consistently low coverage, you do get some trouble from it.
One example is trying to compare a trio: you'll see a lot of de novo mutations in the child and when you look it up on the parents, you'll see that the child has 4 of 6 reads with the variant (was called), and one parent shows it in 2 of 4 total reads (not called); and that's where the "de novo" variant came from. If you're only interested on a small set of genes, this is hardly a problem because you can verify manually. But broaden up your analysis, and you have to sort 200 de novo variants. Yeah sure. Either they're not de novo, or they're so much "de novo" that they started existing today in the sequencer.

Another example of bad behavior that results in a variant not being listed consistently, which happened in some exomes I've ran, is the fact that the Torrent Suite sometimes aligns a deletion on the 5' side, and sometimes on the 3' side of a repeated element. This results in (a) two possible variants describing the same deletion on a series of patients, and (b) the frequency of the deletion being split between 2 positions, possibly resulting in the variant not being called at all.

What I did once, was to take all "de novo" variants in a trio, make them into a hotspot .bed file to be fed to VariantCaller, and then run the VariantCaller plugin again. Every entry of a hotspot must be listed in the final vcf; if it was not called, then the reason must be stated, so you'd spot the risky calls.
A warning though: the default parameters for hotspot regions in Variant Caller are more stringent, so if you keep the default parameters, you will end up with some variants that will not be called anymore even in the sample that had them in the first place.
Another thing I did in that case was to write a simple Python script to go check every "de novo" variant (from the VCF) on the parents' BED files, and see if they actually, honestly did not have it.
Also if you have trios, IonReporter has a specific analysis option for that, and it might be helpful. If it wasn't so, so slow.

Anyways. I feel your pain too. If BBmap helps, let us know.

EDIT: why wait. I'll try BBMap on that data of mine.

I installed bbmap on my cluster. Follow the instructions here: http://jgi.doe.gov/data-and-tools/bb...llation-guide/ . However, if I run $ (installation directory)/stats.sh in=(installation directory)/resources/phix174_ill.ref.fa.gz, I got error as command not found. But I can run bash $ (installation directory)/stats.sh in=(installation directory)/resources/phix174_ill.ref.fa.gz to get the same result as mentioned. So I think I have installed the bbmap correctly.

Then I tried to run your command, this is the command I ran:
bash bbmap/callvariants.sh in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam ref=/IonTorrent/HG19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1 > /home/aiden/result.txt 2> /home/aiden/errorbb.txt &
But I cannot get the command run. The error message in the errrorbb.txt is as following:
java -ea -Xmx46006m -Xms46006m -cp /home/aiden/bbmap/current/ var2.CallVariants in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam ref=/IonTorrent/HG19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1
Executing var2.CallVariants2 [in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam, ref=/IonTorrent/HG19/hg19.fasta, out=vars.vcf, multisample, ploidy=2, prefilter, 1]

Unknown parameter 1
Exception in thread "main" java.lang.AssertionError: Unknown parameter 1
at var2.CallVariants2.<init>(CallVariants2.java:199)
at var2.CallVariants2.main(CallVariants2.java:49)
at var2.CallVariants.main(CallVariants.java:48)

Do you know what I have done wrong?
Thank you.

**r.rosati** · 02-08-2017, 07:35 AM

Originally posted by zhoujiayi View Post

bash bbmap/callvariants.sh in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam ref=/IonTorrent/HG19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1 > /home/aiden/result.txt 2> /home/aiden/errorbb.txt &

I reckon the "1" that I underlined above might be the "parameter 1" that is giving the error. If you look at Brian Bushnell's message, there's no "1" in that position.

**zhoujiayi** · 02-08-2017, 07:42 AM

Originally posted by r.rosati View Post

I reckon the "1" that I underlined above might be the "parameter 1" that is giving the error. If you look at Brian Bushnell's message, there's no "1" in that position.

If you want to save the stdout and stderr of a command, 1 is for output the stdout to file result.txt and 2 is for output the stderr to errorbbmap.txt.This is just Linux command.

**r.rosati** · 02-08-2017, 08:06 AM

Originally posted by zhoujiayi View Post

If you want to save the stdout and stderr of a command, 1 is for output the stdout to file result.txt and 2 is for output the stderr to errorbbmap.txt.This is just Linux command.

Oh, my bad. Then the space must be the culprit, i.e. "1>" and not "1 >"

**zhoujiayi** · 02-09-2017, 06:49 AM

I have tried bbmap. I got 123057 snps and 51208 indels called by bbmap. Just based on the figures, it seems too many indels has been called. Ion Torrent called 46821 SNPs, 372 MNPs and 2187 indels. So I guess Ion Torrent did have some tricks in their algorithm to optimize the result of their sequencing data.
I also tried to use GATK or samtools or freebayes to call the Ion Torrent data before, all of them have the same problem as bbmap. Too many false positives have been called.

I tried to use GATK to find the concordance between these 2 results. It seems due to the format of bbmap's result(<ID=FAIL,Description="Fail">), GATK cannot get through.

I think we will keep using the TVC to call the variants, otherwise we need to start from the beginning to find a way to filter the results of other software.

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, 11-08-2024, 11:09 AM	0 responses 35 views 0 likes	Last Post by seqadmin 11-08-2024, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, 11-08-2024, 06:13 AM	0 responses 28 views 0 likes	Last Post by seqadmin 11-08-2024, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 32 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 23 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

A problem about Ion Torrent Whole Exome Sequencing

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News