Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Brian Bushnell
    replied
    Thanks for the feedback! Looks like Ion Torrent variant calling needs custom settings. I'm not sure what's wrong with ##FORMAT=<ID=FAIL,Number=1,Type=String,Description="Fail">, though... it's a perfectly valid header line.
    Last edited by Brian Bushnell; 02-09-2017, 09:53 AM.

    Leave a comment:


  • zhoujiayi
    replied
    I have tried bbmap. I got 123057 snps and 51208 indels called by bbmap. Just based on the figures, it seems too many indels has been called. Ion Torrent called 46821 SNPs, 372 MNPs and 2187 indels. So I guess Ion Torrent did have some tricks in their algorithm to optimize the result of their sequencing data.
    I also tried to use GATK or samtools or freebayes to call the Ion Torrent data before, all of them have the same problem as bbmap. Too many false positives have been called.

    I tried to use GATK to find the concordance between these 2 results. It seems due to the format of bbmap's result(<ID=FAIL,Description="Fail">), GATK cannot get through.

    I think we will keep using the TVC to call the variants, otherwise we need to start from the beginning to find a way to filter the results of other software.

    Leave a comment:


  • r.rosati
    replied
    Originally posted by zhoujiayi View Post
    If you want to save the stdout and stderr of a command, 1 is for output the stdout to file result.txt and 2 is for output the stderr to errorbbmap.txt.This is just Linux command.
    Oh, my bad. Then the space must be the culprit, i.e. "1>" and not "1 >"

    Leave a comment:


  • zhoujiayi
    replied
    Originally posted by r.rosati View Post
    I reckon the "1" that I underlined above might be the "parameter 1" that is giving the error. If you look at Brian Bushnell's message, there's no "1" in that position.
    If you want to save the stdout and stderr of a command, 1 is for output the stdout to file result.txt and 2 is for output the stderr to errorbbmap.txt.This is just Linux command.

    Leave a comment:


  • r.rosati
    replied
    Originally posted by zhoujiayi View Post
    bash bbmap/callvariants.sh in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam ref=/IonTorrent/HG19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1 > /home/aiden/result.txt 2> /home/aiden/errorbb.txt &
    I reckon the "1" that I underlined above might be the "parameter 1" that is giving the error. If you look at Brian Bushnell's message, there's no "1" in that position.

    Leave a comment:


  • zhoujiayi
    replied
    Originally posted by r.rosati View Post
    Premise: I don't hold a grudge towards semiconductor sequencing. It's pretty good for some applications. It's great for germline data on gene panels. It's ok for investigating by exome analysis a case where there are a few candidate genes and you're not going to order a tailored panel for that specific case.
    But it's a nightmare when you want to perform a broader analysis.

    I agree with you that Ion Torrent default algorithms are very far from perfect. I've been in your situation too, and it's so unnerving.



    Here you probably mean "if the variant is not listed in the vcf, then it's equal to the reference".
    If this is what the support people told you, I wholeheartedly disagree with them. It's just plainly not true.
    Just take a look at the VariantCaller parameters and you'll see a list of things that will result in a variant not being called, which are there in the first place to avoid the sequencer calling too many of the errors it makes.
    For example with default parameters, a region with homozygous SNV with < 5 total reads is not called; for indels, you need 10+ reads, and at least 5 of them on either strand. This might work ok in optimal conditions, i.e. high coverage and high quality, end-to-end reads; but in ordinary conditions, where some amplicons have consistently low coverage, you do get some trouble from it.
    One example is trying to compare a trio: you'll see a lot of de novo mutations in the child and when you look it up on the parents, you'll see that the child has 4 of 6 reads with the variant (was called), and one parent shows it in 2 of 4 total reads (not called); and that's where the "de novo" variant came from. If you're only interested on a small set of genes, this is hardly a problem because you can verify manually. But broaden up your analysis, and you have to sort 200 de novo variants. Yeah sure. Either they're not de novo, or they're so much "de novo" that they started existing today in the sequencer.

    Another example of bad behavior that results in a variant not being listed consistently, which happened in some exomes I've ran, is the fact that the Torrent Suite sometimes aligns a deletion on the 5' side, and sometimes on the 3' side of a repeated element. This results in (a) two possible variants describing the same deletion on a series of patients, and (b) the frequency of the deletion being split between 2 positions, possibly resulting in the variant not being called at all.

    What I did once, was to take all "de novo" variants in a trio, make them into a hotspot .bed file to be fed to VariantCaller, and then run the VariantCaller plugin again. Every entry of a hotspot must be listed in the final vcf; if it was not called, then the reason must be stated, so you'd spot the risky calls.
    A warning though: the default parameters for hotspot regions in Variant Caller are more stringent, so if you keep the default parameters, you will end up with some variants that will not be called anymore even in the sample that had them in the first place.
    Another thing I did in that case was to write a simple Python script to go check every "de novo" variant (from the VCF) on the parents' BED files, and see if they actually, honestly did not have it.
    Also if you have trios, IonReporter has a specific analysis option for that, and it might be helpful. If it wasn't so, so slow.

    Anyways. I feel your pain too. If BBmap helps, let us know.

    EDIT: why wait. I'll try BBMap on that data of mine.
    I installed bbmap on my cluster. Follow the instructions here: http://jgi.doe.gov/data-and-tools/bb...llation-guide/ . However, if I run $ (installation directory)/stats.sh in=(installation directory)/resources/phix174_ill.ref.fa.gz, I got error as command not found. But I can run bash $ (installation directory)/stats.sh in=(installation directory)/resources/phix174_ill.ref.fa.gz to get the same result as mentioned. So I think I have installed the bbmap correctly.

    Then I tried to run your command, this is the command I ran:
    bash bbmap/callvariants.sh in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam ref=/IonTorrent/HG19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1 > /home/aiden/result.txt 2> /home/aiden/errorbb.txt &
    But I cannot get the command run. The error message in the errrorbb.txt is as following:
    java -ea -Xmx46006m -Xms46006m -cp /home/aiden/bbmap/current/ var2.CallVariants in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam ref=/IonTorrent/HG19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1
    Executing var2.CallVariants2 [in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam, ref=/IonTorrent/HG19/hg19.fasta, out=vars.vcf, multisample, ploidy=2, prefilter, 1]

    Unknown parameter 1
    Exception in thread "main" java.lang.AssertionError: Unknown parameter 1
    at var2.CallVariants2.<init>(CallVariants2.java:199)
    at var2.CallVariants2.main(CallVariants2.java:49)
    at var2.CallVariants.main(CallVariants.java:48)

    Do you know what I have done wrong?
    Thank you.

    Leave a comment:


  • Brian Bushnell
    replied
    Originally posted by r.rosati View Post
    EDIT: why wait. I'll try BBMap on that data of mine.
    Thanks, I'd like to see the results! Probably I need to do some calibration to get rid of the most obvious IonTorrent-specific error modes.

    Leave a comment:


  • r.rosati
    replied
    Premise: I don't hold a grudge towards semiconductor sequencing. It's pretty good for some applications. It's great for germline data on gene panels. It's ok for investigating by exome analysis a case where there are a few candidate genes and you're not going to order a tailored panel for that specific case.
    But it's a nightmare when you want to perform a broader analysis.

    I agree with you that Ion Torrent default algorithms are very far from perfect. I've been in your situation too, and it's so unnerving.

    Originally posted by zhoujiayi View Post
    Ion Torrent can only call variant for one sample. So if the sample doesn't have this variant, the vcf file won't have it. But you don't really know if the genotype at that position is not observed or same as the reference without looking into the bam files. I asked the Ion Torrent support people, they said if the variant is listed in the vcf, you can consider it same as reference.
    Here you probably mean "if the variant is not listed in the vcf, then it's equal to the reference".
    If this is what the support people told you, I wholeheartedly disagree with them. It's just plainly not true.
    Just take a look at the VariantCaller parameters and you'll see a list of things that will result in a variant not being called, which are there in the first place to avoid the sequencer calling too many of the errors it makes.
    For example with default parameters, a region with homozygous SNV with < 5 total reads is not called; for indels, you need 10+ reads, and at least 5 of them on either strand. This might work ok in optimal conditions, i.e. high coverage and high quality, end-to-end reads; but in ordinary conditions, where some amplicons have consistently low coverage, you do get some trouble from it.
    One example is trying to compare a trio: you'll see a lot of de novo mutations in the child and when you look it up on the parents, you'll see that the child has 4 of 6 reads with the variant (was called), and one parent shows it in 2 of 4 total reads (not called); and that's where the "de novo" variant came from. If you're only interested on a small set of genes, this is hardly a problem because you can verify manually. But broaden up your analysis, and you have to sort 200 de novo variants. Yeah sure. Either they're not de novo, or they're so much "de novo" that they started existing today in the sequencer.

    Another example of bad behavior that results in a variant not being listed consistently, which happened in some exomes I've ran, is the fact that the Torrent Suite sometimes aligns a deletion on the 5' side, and sometimes on the 3' side of a repeated element. This results in (a) two possible variants describing the same deletion on a series of patients, and (b) the frequency of the deletion being split between 2 positions, possibly resulting in the variant not being called at all.

    What I did once, was to take all "de novo" variants in a trio, make them into a hotspot .bed file to be fed to VariantCaller, and then run the VariantCaller plugin again. Every entry of a hotspot must be listed in the final vcf; if it was not called, then the reason must be stated, so you'd spot the risky calls.
    A warning though: the default parameters for hotspot regions in Variant Caller are more stringent, so if you keep the default parameters, you will end up with some variants that will not be called anymore even in the sample that had them in the first place.
    Another thing I did in that case was to write a simple Python script to go check every "de novo" variant (from the VCF) on the parents' BED files, and see if they actually, honestly did not have it.
    Also if you have trios, IonReporter has a specific analysis option for that, and it might be helpful. If it wasn't so, so slow.

    Anyways. I feel your pain too. If BBmap helps, let us know.

    EDIT: why wait. I'll try BBMap on that data of mine.
    Last edited by r.rosati; 02-07-2017, 08:46 AM.

    Leave a comment:


  • zhoujiayi
    replied
    Originally posted by Brian Bushnell View Post
    Actually, in this case, BBMap would produce two lines that look like this:

    Code:
    chr2 202006095 . AG A
    chr2 202006096 . G A
    Thank you so much. I'll try it.

    Leave a comment:


  • Brian Bushnell
    replied
    Actually, in this case, BBMap would produce two lines that look like this:

    Code:
    chr2 202006095 . AG A
    chr2 202006096 . G A

    Leave a comment:


  • zhoujiayi
    replied
    Originally posted by Brian Bushnell View Post
    Hi zhoujiayi,

    I don't really like the practice of combining indels and SNPs (or, generally, different variants) on the same line in VCF, as it makes the format even more confusing and difficult to read. So, BBMap's CallVariants tool puts each variant on a unique line, which might be what you're looking for. You use it like this:

    callvariants.sh in=sample1.sam,sample2.sam ref=ref.fa out=vars.vcf multisample ploidy=2 prefilter

    I'd be interested in hearing how well it performs on Ion Torrent data; I've only been testing it on Illumina so far.
    Thank you for your suggestion. But this software cannot help for my case. Because Ion Torrent detected the variants as a INDEL chr2 202006095 . AG A,AA 1/1, even I use the software to split it into 2 line. I got chr2 202006095 . AG A and chr2 202006095 . AG AA. Then, if I use software to check if the parents have these 2 variants, the parents won't have them. But in fact, chr2 202006095 . AG AA should be chr2 202006096 G A. The parents have this variant. So it is not the variant we are looking for. For sure, we can check this kind of variants manually. But we have lots of variants like this. It is impossible to filter them manually.

    Leave a comment:


  • zhoujiayi
    replied
    Originally posted by mlariviere View Post
    Hi Jiayi Zhou,

    Using AmpliSeq Exome on Proton must not be that bad. It allowed one of researchers at your institution to find mutation causing hearing loss in Newfoundland population:


    Have you contacted your local Ion Torrent support regarding your issue?
    Yes. I am one of the author of this paper. The performance of detecting SNPs seems good for Ion Torrrent. But we are having troubles with INDELs. We are now working on improving the performance of detecting INDELs.

    Leave a comment:


  • Brian Bushnell
    replied
    Hi zhoujiayi,

    I don't really like the practice of combining indels and SNPs (or, generally, different variants) on the same line in VCF, as it makes the format even more confusing and difficult to read. So, BBMap's CallVariants tool puts each variant on a unique line, which might be what you're looking for. You use it like this:

    callvariants.sh in=sample1.sam,sample2.sam ref=ref.fa out=vars.vcf multisample ploidy=2 prefilter

    I'd be interested in hearing how well it performs on Ion Torrent data; I've only been testing it on Illumina so far.

    Leave a comment:


  • mlariviere
    replied
    Hi Jiayi Zhou,

    Using AmpliSeq Exome on Proton must not be that bad. It allowed one of researchers at your institution to find mutation causing hearing loss in Newfoundland population:


    Have you contacted your local Ion Torrent support regarding your issue?

    Leave a comment:


  • jhalpin
    replied
    I don't know the solution, but I can tell you Ion Torrent has a lot of deletion errors that are sequencing artifacts when sequencing through a long (>5bp) homopolymer region. If this is where your variants fall, you're going to have a bad time.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM
  • seqadmin
    Understanding Genetic Influence on Infectious Disease
    by seqadmin




    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
    09-09-2024, 10:59 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-26-2024, 12:57 PM
0 responses
16 views
0 likes
Last Post seqadmin  
Working...
X