Seqanswers Leaderboard Ad

**TiborNagy** · 03-25-2014, 06:00 AM

If you analyse human data, GATK is a very good tool.

**gmarco** · 03-25-2014, 06:01 AM

I'm using GATK as I do with Illumina data. I'm experiencing very slow variant calling process with UnifiedGenotyper.

**snetmcom** · 03-29-2014, 10:02 PM

Could you ask your service provider to analyze the data on their Torrent Suite? It extremely simple on the Server.

**c_ro87** · 04-04-2014, 06:47 AM

Hello, i'm analyzing Ampliseq runs on custom gene panels, using a 316 Chip

i also want to try something different, i also try the GATK, but after the MarkDuplicates step, i found that due to my amplicons start and end in the same locations in the alignment thay are marked as PCR duplicates and removed from the next steps..

so i can't do this step, if i follow the best practices guidelines, and aplly hard filtering, i get a lot more variants that with the ION variantCaller pipeline

i mean aprox 30 variants in the ION pipeline for each barcado, and aprox 150 with GATK after the hard filtering step

i don't know how good is this disagrement

what experience do you have?

**snetmcom** · 04-04-2014, 12:44 PM

Ampliseq library will be mostly duplicates by design. You should not be filtering these reads out.

GATK is likely calling false positives because it does not have any ion specific rules.

**arnaud83** · 04-10-2014, 01:10 AM

I'm glad to see people who think like me about the Torrent Server Suite and its Tool Box.
I'm used to deal with PGM data, and here is my pipeline :

-Alignment is done with bwasw. I tested several alignment programs (tmap, novoalign) and even if the percentage of mapped reads is 4-5 % smaller than tmap, the indel mismatch is better.

-For an exome i use MarkDuplicates.jar from PICARD. For targeted sequencing, it's not recommended because 80-90% of reads will be marked.

-Then, i use FreeBayes to call SNP and UnifiedGenotyper to call INDEL. I prefer to use FreeBayes to call SNP because i can set the min variant frequency and be more sensitive than UnifiedGenotyper. But It requires additional filtering (Strand Bias, Quality, etc...).
For more specificity without to much work, i suggest you UnifiedGenotyper for both SNP and INDEL.
But to be honest you have better chance to call a true INDEL with flipping a coin. There are too many False Positive, and for an exome it's a misery.

**c_ro87** · 04-11-2014, 07:00 AM

@arnaud83: how you do the strand bias filtering?

**arnaud83** · 04-11-2014, 10:26 AM

Originally posted by c_ro87 View Post

@arnaud83: how you do the strand bias filtering?

For strand bias based on Fisher's exact test (Unifiedgenotyper), i use a threshold of 60 ( p=0.000001). Variants below this threshold will be keep.
FreeBayes doesn't include strand bias in the vcf output, but you can easily compute this with some programming skills

**IonTom** · 04-13-2014, 03:29 AM

I also kind of gave up on the IonTorrent Suite.

Currently i am using NextGenMap for Alignment.

For Variant Calling i use Platypus. The principle is kind of similar to FreeBayes,
but the QC statistics and filters are much more complete. As in everything you can think of.

One additional fillter I use is implemented in the BioConductor VariantTools package.
It tells you at how many different in read positions a variant was found.
This is kind of important as removing PCR duplicates is not really possible for
Amplicon data.

**arnaud83** · 04-13-2014, 10:57 PM

I did not know these tools. Thank you.
I will test them.

**IonTom** · 04-24-2014, 12:00 PM

@arnaud83: How did they work for you ?

There is a nice paper discussing the topic of using aligners on ion torrent data:

Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data - BMC Genomics

http://www.biomedcentral.com/1471-2164/15/264/

Background The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. Results In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. Conclusions A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform.

**wolfpack14** · 04-26-2014, 07:45 AM

The homopolymer issue in IonTorrent can be semi-mitigated through setting frequency thresholds based on mixture fractions. The solution can be applied through post-processing or integrated into one of these variant caller applications. We're working on a paper right now that demonstrates the methodology in a productional lab environment (vs. academic environment you see in most papers).

**arnaud83** · 04-29-2014, 04:54 AM

Originally posted by IonTom View Post

@arnaud83: How did they work for you ?

There is a nice paper discussing the topic of using aligners on ion torrent data:
http://www.biomedcentral.com/1471-2164/15/264/

Well, to be honest, i'm a little bit disappointed by mosaik. The mentioned paper shows promising results but i obtained worse results than bwa or tmap.

**gmarco** · 06-09-2014, 12:32 AM

I'm very happy seeing this topic has received many answers. I'm wiling to try all these tools.

Originally posted by wolfpack14 View Post

The homopolymer issue in IonTorrent can be semi-mitigated through setting frequency thresholds based on mixture fractions. The solution can be applied through post-processing or integrated into one of these variant caller applications. We're working on a paper right now that demonstrates the methodology in a productional lab environment (vs. academic environment you see in most papers).

Hello wolfpack do you have any ETA?

I expected very very very slow GATK UnifiedGenotyper variant calling with Ion Torrent exome variant calling. Anyone had this issue?

Ion Torrent data has 2 major issues:
1 - Dealing with homopolymer problem (how the hell we're supposed to filter those reads, or deal with them)
2 - Setup correct variant calling settings.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Variant Calling outside Torrent Suite and TVC

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News