Seqanswers Leaderboard Ad

**sarvidsson** · 02-18-2015, 12:11 AM

I have the same experience as you - all popular variant calling tools out there are optimized for shotgun libraries. For targeted resequencing with PCR amplicons I do a pileup with samtools and parse the output, calling variants at suitable thresholds.

**KaiYe** · 02-18-2015, 08:38 AM

Pindel is able to call indels from many samples at the same time and give you indication of which samples are the carriers. The new pindel2vcf gives you a clean way to view the result with the numbers of reads supporting ref and alt alleles.

you could put a list of bam files in the config file and run all samples together.

**Matt Kearse** · 02-18-2015, 12:21 PM

Just wondering if you've tried Geneious? It's commercial software, so maybe that doesn't meet your requirements, and I'm biased since I wrote the variant caller in it, but I've never seen it fail to call an obvious variant.

If you're willing to share your data along with the locations of some obvious indels that aren't getting called, then I can run it through Geneious for you and let you know how it goes.

**alexholman** · 02-24-2015, 06:59 AM

Well, after much frustration, I stumbled onto the package freebayes "Bayesian haplotype-based polymorphism discovery and genotyping"

GitHub - freebayes/freebayes: Bayesian haplotype-based genetic polymorphism discovery and genotyping.

https://github.com/ekg/freebayes

Bayesian haplotype-based genetic polymorphism discovery and genotyping. - freebayes/freebayes

The caller seems to work well on amplicon data and ended up being the cleanest and most complete VCF file (with ref and alt allele frequencies).

**alexholman** · 02-24-2015, 07:02 AM

sorry, double post

**Matt Kearse** · 02-24-2015, 10:32 PM

Alex shared some of his data with me to run through Geneious and we corresponded a bit by email, so I thought I'd share the results with anyone else who's interested.

Geneious called the indels although it split them into multiple adjacent indels in some situations which isn't ideal. I hope to improve this soon.

I also ran his data through FreeBayes, which also found the obvious indels he expected, but it didn't find other 'obvious' indels he wasn't aware of.

The main problem was that the data was poorly aligned. For example, in one sample, one allele had a 29bp deletion and the other allele a 44bp deletion in the same region. The alignment created using BWA mem had failed to span the 44bp deletion, so no neither Geneious nor FreeBayes would call this indel from this alignment. I generated a better alignment using Geneious, and then Geneious called both indels, although split it a way that made it difficult to infer the two alleles. FreeBayes still failed to identify the two alleles in this case even when provided with an improved alignment.

For indels like this, I recommend aligning using either Geneious, or BBMap which both successfully span large indels. Or maybe other aligners have settings to tweak that will improve results around indels.

And for variant calling on this type of data, both Geneious or FreeBayes do OK, although neither works perfectly on the data Alex provided even when I generated a better alignment.

**sarvidsson** · 02-25-2015, 02:43 AM

Originally posted by Matt Kearse View Post

I also ran his data through FreeBayes, which also found the obvious indels he expected, but it didn't find other 'obvious' indels he wasn't aware of.

That's my experience with Freebayes as well, I haven't found a good set of parameters to work with the amplicon data I have. I typically use GMAP (when I have Sanger data) and GSNAP (for Illumina data) to align this kind of data, and with my own custom caller on pileups from samtools am quite happy...

**sarvidsson** · 02-25-2015, 02:46 AM

Originally posted by Matt Kearse View Post

For indels like this, I recommend aligning using either Geneious, or BBMap which both successfully span large indels. Or maybe other aligners have settings to tweak that will improve results around indels.

One of my colleagues has a Geneious license, so I might try it. However, I'm a command line guy - is there a way to automate Geneious for amplicon data? (I typically have thousands of samples with custom inline barcoding schemes)

**Matt Kearse** · 02-25-2015, 02:03 PM

Originally posted by sarvidsson View Post

One of my colleagues has a Geneious license, so I might try it. However, I'm a command line guy - is there a way to automate Geneious for amplicon data? (I typically have thousands of samples with custom inline barcoding schemes)

Unfortunately no, there isn't a Geneious command line interface. You can align or variant call in bulk by selecting all the data sets and choosing the options once.

Or if that's not sufficient you can put together workflows with optional custom code fragments. See https://www.youtube.com/watch?v=uvgB2_YBmD4 for a short demo of workflows.

Also, one limitation of Geneious is that you can't yet export to VCF format so you'll have to settle for CSV export for now.

**sarvidsson** · 02-26-2015, 12:20 AM

Originally posted by Matt Kearse View Post

Unfortunately no, there isn't a Geneious command line interface. You can align or variant call in bulk by selecting all the data sets and choosing the options once.
Or if that's not sufficient you can put together workflows with optional custom code fragments. See https://www.youtube.com/watch?v=uvgB2_YBmD4 for a short demo of workflows.

I'll have a look at the custom workflows. Would it be possible to bulk import thousands of (typically paired) FASTQ files and assigning sample IDs to them?

Originally posted by Matt Kearse View Post

Also, one limitation of Geneious is that you can't yet export to VCF format so you'll have to settle for CSV export for now.

VCF would be nice but is not a must. If the CSV format contain enough data I could genereate VCF from it where needed. First I'd like to compare the aligner/caller to our current amplicon re-sequencing pipeline.

**Matt Kearse** · 02-26-2015, 01:58 PM

Originally posted by sarvidsson View Post

Would it be possible to bulk import thousands of (typically paired) FASTQ files and assigning sample IDs to them?

If prior to import you give the FASTQ files names that match their sample ID then their file name becomes the effective sample ID. Paired files should have an suffix (e.g 1 or 2) which will get stripped from the name when you pair them within Geneious which can be done in bulk.

It's probably best you just try it with a sample or two to start with to see if Geneious gives acceptable results on your data.

**maxsalm** · 02-27-2015, 05:50 AM

You may also find Scalpel useful (http://scalpel.sourceforge.net/) which uses an assembly step during indel calling (http://www.ncbi.nlm.nih.gov/pubmed/25128977 ) that may help with some of the alignment-derived false negatives.

**gerbarinov** · 03-09-2015, 03:57 PM

Originally posted by alexholman View Post

I am attempting to detect indels from a panel of clones resulting from CRISPR targeted deletion. Regions around the target were PCR amplified to produce a roughly 160bp amplicon, which was then sequenced with as a PE150 run.
I've been banging my head against finding a tool that can:

1) Detect which clones have indels
2) Identify the location of these indels, ideally in a VCF or similar file such that the full panel of 96 clones can visualized (i.e. in IGV).
3) Provide annotation details about the read depth at the indel position and percentage of the sequences that contain an indel.
4) Is a stand-alone tool that I can install on my Unix bo

Thank you, for your very informative topic with a happy ending

**Robin** · 07-24-2015, 12:02 PM

Originally posted by alexholman View Post

Well, after much frustration, I stumbled onto the package freebayes "Bayesian haplotype-based polymorphism discovery and genotyping"

GitHub - freebayes/freebayes: Bayesian haplotype-based genetic polymorphism discovery and genotyping.

https://github.com/ekg/freebayes

Bayesian haplotype-based genetic polymorphism discovery and genotyping. - freebayes/freebayes

The caller seems to work well on amplicon data and ended up being the cleanest and most complete VCF file (with ref and alt allele frequencies).

I have Crisp dataset with PE 250bp, and I tried Bayesian software tool, but I see very little INDEL in the vcf result file, and I can see the INDEL in IGV when the bam file is loaded into IGV view. I would like you to share some of detail info with me.
1) I used bwa-mem as an aligner, and which one you used?
2) my freebayes command line:
freebayes -f /home/db/chr5.fa --region chr5:112818960-112819204 my_sorted.bam > results.vcf

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Indel detection in NGS high coverage amplicons

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News