Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Thanks for the feedback! Looks like Ion Torrent variant calling needs custom settings. I'm not sure what's wrong with ##FORMAT=<ID=FAIL,Number=1,Type=String,Description="Fail">, though... it's a perfectly valid header line.Last edited by Brian Bushnell; 02-09-2017, 09:53 AM.
-
I have tried bbmap. I got 123057 snps and 51208 indels called by bbmap. Just based on the figures, it seems too many indels has been called. Ion Torrent called 46821 SNPs, 372 MNPs and 2187 indels. So I guess Ion Torrent did have some tricks in their algorithm to optimize the result of their sequencing data.
I also tried to use GATK or samtools or freebayes to call the Ion Torrent data before, all of them have the same problem as bbmap. Too many false positives have been called.
I tried to use GATK to find the concordance between these 2 results. It seems due to the format of bbmap's result(<ID=FAIL,Description="Fail">), GATK cannot get through.
I think we will keep using the TVC to call the variants, otherwise we need to start from the beginning to find a way to filter the results of other software.
Leave a comment:
-
Originally posted by zhoujiayi View PostIf you want to save the stdout and stderr of a command, 1 is for output the stdout to file result.txt and 2 is for output the stderr to errorbbmap.txt.This is just Linux command.
Leave a comment:
-
Originally posted by r.rosati View PostI reckon the "1" that I underlined above might be the "parameter 1" that is giving the error. If you look at Brian Bushnell's message, there's no "1" in that position.
Leave a comment:
-
Originally posted by zhoujiayi View Postbash bbmap/callvariants.sh in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam ref=/IonTorrent/HG19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1 > /home/aiden/result.txt 2> /home/aiden/errorbb.txt &
Leave a comment:
-
Originally posted by r.rosati View PostPremise: I don't hold a grudge towards semiconductor sequencing. It's pretty good for some applications. It's great for germline data on gene panels. It's ok for investigating by exome analysis a case where there are a few candidate genes and you're not going to order a tailored panel for that specific case.
But it's a nightmare when you want to perform a broader analysis.
I agree with you that Ion Torrent default algorithms are very far from perfect. I've been in your situation too, and it's so unnerving.
Here you probably mean "if the variant is not listed in the vcf, then it's equal to the reference".
If this is what the support people told you, I wholeheartedly disagree with them. It's just plainly not true.
Just take a look at the VariantCaller parameters and you'll see a list of things that will result in a variant not being called, which are there in the first place to avoid the sequencer calling too many of the errors it makes.
For example with default parameters, a region with homozygous SNV with < 5 total reads is not called; for indels, you need 10+ reads, and at least 5 of them on either strand. This might work ok in optimal conditions, i.e. high coverage and high quality, end-to-end reads; but in ordinary conditions, where some amplicons have consistently low coverage, you do get some trouble from it.
One example is trying to compare a trio: you'll see a lot of de novo mutations in the child and when you look it up on the parents, you'll see that the child has 4 of 6 reads with the variant (was called), and one parent shows it in 2 of 4 total reads (not called); and that's where the "de novo" variant came from. If you're only interested on a small set of genes, this is hardly a problem because you can verify manually. But broaden up your analysis, and you have to sort 200 de novo variants. Yeah sure. Either they're not de novo, or they're so much "de novo" that they started existing today in the sequencer.
Another example of bad behavior that results in a variant not being listed consistently, which happened in some exomes I've ran, is the fact that the Torrent Suite sometimes aligns a deletion on the 5' side, and sometimes on the 3' side of a repeated element. This results in (a) two possible variants describing the same deletion on a series of patients, and (b) the frequency of the deletion being split between 2 positions, possibly resulting in the variant not being called at all.
What I did once, was to take all "de novo" variants in a trio, make them into a hotspot .bed file to be fed to VariantCaller, and then run the VariantCaller plugin again. Every entry of a hotspot must be listed in the final vcf; if it was not called, then the reason must be stated, so you'd spot the risky calls.
A warning though: the default parameters for hotspot regions in Variant Caller are more stringent, so if you keep the default parameters, you will end up with some variants that will not be called anymore even in the sample that had them in the first place.
Another thing I did in that case was to write a simple Python script to go check every "de novo" variant (from the VCF) on the parents' BED files, and see if they actually, honestly did not have it.
Also if you have trios, IonReporter has a specific analysis option for that, and it might be helpful. If it wasn't so, so slow.
Anyways. I feel your pain too. If BBmap helps, let us know.
EDIT: why wait. I'll try BBMap on that data of mine.
Then I tried to run your command, this is the command I ran:
bash bbmap/callvariants.sh in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam ref=/IonTorrent/HG19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1 > /home/aiden/result.txt 2> /home/aiden/errorbb.txt &
But I cannot get the command run. The error message in the errrorbb.txt is as following:
java -ea -Xmx46006m -Xms46006m -cp /home/aiden/bbmap/current/ var2.CallVariants in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam ref=/IonTorrent/HG19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1
Executing var2.CallVariants2 [in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam, ref=/IonTorrent/HG19/hg19.fasta, out=vars.vcf, multisample, ploidy=2, prefilter, 1]
Unknown parameter 1
Exception in thread "main" java.lang.AssertionError: Unknown parameter 1
at var2.CallVariants2.<init>(CallVariants2.java:199)
at var2.CallVariants2.main(CallVariants2.java:49)
at var2.CallVariants.main(CallVariants.java:48)
Do you know what I have done wrong?
Thank you.
Leave a comment:
-
Originally posted by r.rosati View PostEDIT: why wait. I'll try BBMap on that data of mine.
Leave a comment:
-
Premise: I don't hold a grudge towards semiconductor sequencing. It's pretty good for some applications. It's great for germline data on gene panels. It's ok for investigating by exome analysis a case where there are a few candidate genes and you're not going to order a tailored panel for that specific case.
But it's a nightmare when you want to perform a broader analysis.
I agree with you that Ion Torrent default algorithms are very far from perfect. I've been in your situation too, and it's so unnerving.
Originally posted by zhoujiayi View PostIon Torrent can only call variant for one sample. So if the sample doesn't have this variant, the vcf file won't have it. But you don't really know if the genotype at that position is not observed or same as the reference without looking into the bam files. I asked the Ion Torrent support people, they said if the variant is listed in the vcf, you can consider it same as reference.
If this is what the support people told you, I wholeheartedly disagree with them. It's just plainly not true.
Just take a look at the VariantCaller parameters and you'll see a list of things that will result in a variant not being called, which are there in the first place to avoid the sequencer calling too many of the errors it makes.
For example with default parameters, a region with homozygous SNV with < 5 total reads is not called; for indels, you need 10+ reads, and at least 5 of them on either strand. This might work ok in optimal conditions, i.e. high coverage and high quality, end-to-end reads; but in ordinary conditions, where some amplicons have consistently low coverage, you do get some trouble from it.
One example is trying to compare a trio: you'll see a lot of de novo mutations in the child and when you look it up on the parents, you'll see that the child has 4 of 6 reads with the variant (was called), and one parent shows it in 2 of 4 total reads (not called); and that's where the "de novo" variant came from. If you're only interested on a small set of genes, this is hardly a problem because you can verify manually. But broaden up your analysis, and you have to sort 200 de novo variants. Yeah sure. Either they're not de novo, or they're so much "de novo" that they started existing today in the sequencer.
Another example of bad behavior that results in a variant not being listed consistently, which happened in some exomes I've ran, is the fact that the Torrent Suite sometimes aligns a deletion on the 5' side, and sometimes on the 3' side of a repeated element. This results in (a) two possible variants describing the same deletion on a series of patients, and (b) the frequency of the deletion being split between 2 positions, possibly resulting in the variant not being called at all.
What I did once, was to take all "de novo" variants in a trio, make them into a hotspot .bed file to be fed to VariantCaller, and then run the VariantCaller plugin again. Every entry of a hotspot must be listed in the final vcf; if it was not called, then the reason must be stated, so you'd spot the risky calls.
A warning though: the default parameters for hotspot regions in Variant Caller are more stringent, so if you keep the default parameters, you will end up with some variants that will not be called anymore even in the sample that had them in the first place.
Another thing I did in that case was to write a simple Python script to go check every "de novo" variant (from the VCF) on the parents' BED files, and see if they actually, honestly did not have it.
Also if you have trios, IonReporter has a specific analysis option for that, and it might be helpful. If it wasn't so, so slow.
Anyways. I feel your pain too. If BBmap helps, let us know.
EDIT: why wait. I'll try BBMap on that data of mine.Last edited by r.rosati; 02-07-2017, 08:46 AM.
Leave a comment:
-
Actually, in this case, BBMap would produce two lines that look like this:
Code:chr2 202006095 . AG A chr2 202006096 . G A
Leave a comment:
-
Originally posted by Brian Bushnell View PostHi zhoujiayi,
I don't really like the practice of combining indels and SNPs (or, generally, different variants) on the same line in VCF, as it makes the format even more confusing and difficult to read. So, BBMap's CallVariants tool puts each variant on a unique line, which might be what you're looking for. You use it like this:
callvariants.sh in=sample1.sam,sample2.sam ref=ref.fa out=vars.vcf multisample ploidy=2 prefilter
I'd be interested in hearing how well it performs on Ion Torrent data; I've only been testing it on Illumina so far.
Leave a comment:
-
Originally posted by mlariviere View PostHi Jiayi Zhou,
Using AmpliSeq Exome on Proton must not be that bad. It allowed one of researchers at your institution to find mutation causing hearing loss in Newfoundland population:
Have you contacted your local Ion Torrent support regarding your issue?
Leave a comment:
-
Hi zhoujiayi,
I don't really like the practice of combining indels and SNPs (or, generally, different variants) on the same line in VCF, as it makes the format even more confusing and difficult to read. So, BBMap's CallVariants tool puts each variant on a unique line, which might be what you're looking for. You use it like this:
callvariants.sh in=sample1.sam,sample2.sam ref=ref.fa out=vars.vcf multisample ploidy=2 prefilter
I'd be interested in hearing how well it performs on Ion Torrent data; I've only been testing it on Illumina so far.
Leave a comment:
-
Hi Jiayi Zhou,
Using AmpliSeq Exome on Proton must not be that bad. It allowed one of researchers at your institution to find mutation causing hearing loss in Newfoundland population:
Have you contacted your local Ion Torrent support regarding your issue?
Leave a comment:
-
I don't know the solution, but I can tell you Ion Torrent has a lot of deletion errors that are sequencing artifacts when sequencing through a long (>5bp) homopolymer region. If this is where your variants fall, you're going to have a bad time.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
09-30-2024, 08:33 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: