Unconfigured Ad

**NGSfan** · 01-31-2011, 04:54 AM

Hi ulz_peter,

We are also interested in this question. I'm surprised others have not asked this and are only looking for SNPs/indels in exome data

I have not found a published package yet that does this.

Hard Cutoffs for reads/exon are not safe since you'll get too many False positives for deletions if the capture did not go well or when going from one sample to another (unless you normalize reads across samples).

However, I am in collaboration with someone who is working on this problem using a statistical approach

**Yilong Li** · 01-31-2011, 06:35 AM

I haven't specifically looked into this problem but I have a feeling that BEDTools would contain functions (for example genomeCoverage) for calculating what you need.

So something like calculate base-wise genome coverage of the exome data -> filter through the capture regions by discarding bases not in those regions -> filter out bases having coverage <20x.

**Michael.James.Clark** · 01-31-2011, 01:35 PM

Originally posted by ulz_peter View Post

Is there already a published way of looking at sites which should have been captured but didn't be sequenced?

I'm not sure about a published way. I am working on a paper that will include this type of analysis (among many other things), so I can't go into detail yet, but I can suggest some things more generally.

I personally wrote a perl script to do this myself. It basically calculates mean coverage across each targeted interval and generates a set of "% targets above mean coverage" metrics. It will be easy to tweak this script to instead output all the intervals below a specific coverage value (whatever you think equates to "too low" for variant calling--4X coverage or whatever).

Just for your information, I compared multiple different exome pull-down platforms and saw various amounts of un-covered targets in each (ranging from ~1%-~7% missed).

It appeared that even with additional sequencing, there are 1-2% of the targets in each platform that simply didn't pull down adequately and thereby failed.

**swbarnes2** · 01-31-2011, 03:47 PM

BEDTools does indeed have something. Below an excerpt from their histogram output. Column 5 is the coverage, and column 6 is how many bases were covered that well. Column 7 is the total length of the feature, column 8 is the % of the feature covered at that exact depth.

So for the exon below, 8 bases were totally uncovered, and 122 bases were covered at 5x or greater.

I made it with this command line, courtesy of another poster on Seqanswers

coverageBed -abam reads.bam b exons.bed -hist >result.txt

chr1 176160618 176161057 Spna1-41 0 8 439 0.0182232
chr1 176160618 176161057 Spna1-41 1 59 439 0.1343964
chr1 176160618 176161057 Spna1-41 2 92 439 0.2095672
chr1 176160618 176161057 Spna1-41 3 103 439 0.2346241
chr1 176160618 176161057 Spna1-41 4 55 439 0.1252847
chr1 176160618 176161057 Spna1-41 5 21 439 0.0478360
chr1 176160618 176161057 Spna1-41 6 49 439 0.1116173
chr1 176160618 176161057 Spna1-41 7 27 439 0.0615034
chr1 176160618 176161057 Spna1-41 8 25 439 0.0569476

**NGSfan** · 02-03-2011, 10:11 AM

Originally posted by Michael.James.Clark View Post

I'm not sure about a published way. I am working on a paper that will include this type of analysis (among many other things), so I can't go into detail yet, but I can suggest some things more generally.

I personally wrote a perl script to do this myself. It basically calculates mean coverage across each targeted interval and generates a set of "% targets above mean coverage" metrics. It will be easy to tweak this script to instead output all the intervals below a specific coverage value (whatever you think equates to "too low" for variant calling--4X coverage or whatever).

Just for your information, I compared multiple different exome pull-down platforms and saw various amounts of un-covered targets in each (ranging from ~1%-~7% missed).

It appeared that even with additional sequencing, there are 1-2% of the targets in each platform that simply didn't pull down adequately and thereby failed.

Yes, this is precisely the problem - you will always have 1-2% of target regions (eg. exons) that are not captured. If you have , say, 15000 target regions, then up to 300 regions you would false call as deletions. So you would have to have a way to figure out the "bad baits" and remove them from the analysis. The question is - across how many samples does a bait not have to work to call it bad? 5? 10 ? 20?

This is a trickier problem in that I don't think thresholds like #reads/exon would work robustly.

Variability in capture rates, sampling depths, etc would make it hard to use one cutoff consistently.. no?

Just brainstorming here...

**NGSfan** · 02-16-2011, 09:21 AM

Just to give you a taste on what we have working (I cannot give the details since we have not published yet):

We've got ~60 cell lines sequenced with targeted exonic capture.

Our methods takes the bait locations, calculates the coverage and determines deleted regions. It does not use simple cutoffs, but looks at the distribution of normalized reads across all cell lines for each bait and determines what read count represents "copy number 2" (diploid) and what is less than (deletion) or greater than diploid (amplification or copy number). For now we can get the deletions reliably - but CNV counts are trickier. It's very preliminary though !

So far CNV/deletion algorithms I know are only for whole genome sequencing.

IGV tracks:

Sendspace

http://www.sendspace.com/file/dzh753

Copy number/deletion prediction:

Sendspace

http://www.sendspace.com/file/tj4xii

**nilshomer** · 02-16-2011, 07:12 PM

Is it the NCI 60 set?

**NGSfan** · 02-17-2011, 01:55 AM

No, it's our own sequenced exome capture from 60-80 different cell lines - it probably has considerable overlap though.

Although I didn't know there is exome capture data available for the NCI 60? Where would I find this?

Would be interesting to run it on that data.

**csoong** · 02-17-2011, 09:25 AM

I don't know if others have run into problems of not being able to get a hand on commercialized exome baits bed files. I spoke with a sales rep and he told me it is proprietary info. yikes.

**swbarnes2** · 02-17-2011, 10:02 AM

Originally posted by csoong View Post

I don't know if others have run into problems of not being able to get a hand on commercialized exome baits bed files. I spoke with a sales rep and he told me it is proprietary info. yikes.

That's terrible. How can you validate if its working up to specs if you don't know what you are supposed to be capturing?

We used the Agilient mouse exome set, and they gave us a .bed file

**NGSfan** · 02-17-2011, 10:10 AM

Originally posted by csoong View Post

I don't know if others have run into problems of not being able to get a hand on commercialized exome baits bed files. I spoke with a sales rep and he told me it is proprietary info. yikes.

What company was this? Nimblegen? Illumina? Agilent?

That doesn't sound good - it is crucial to have the bait bed files - how else will you know what was probed and not probed?

Maybe the sales rep meant it was only for paying customers?

**csoong** · 02-17-2011, 10:20 AM

thats good to hear, at least now they can't say they arent giving out the bed files anymore

im using human sureselect all exon kit 50 mb

**csoong** · 02-17-2011, 12:01 PM

thanks to all you people, i finally will have my hands on the files after speaking with the company

**swbarnes2** · 02-17-2011, 03:23 PM

Hooray, happy ending.

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 39 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 61 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Missing Converage in Exome seq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News