Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ulz_peter
    Senior Member
    • Feb 2010
    • 219

    Missing Converage in Exome seq

    Hi guys,

    I am working on some exome datasets and ran into a problem:
    Is there already a published way of looking at sites which should have been captured but didn't be sequenced?
    E.g. we did exome sequencing of a patient and I did alignment, SNP calling (GATK) and afterwards variant annotation using Annovar. We are now down to 3 SNPs. While we are now trying to validate that experimentally I was wondering how to determine parts of the exome which are not covered by let's assume 20+ reads. Moreover, how could one determine homozygote exon deletions (they should not be covered by sequences at all, but that can happen due to chance as well)?

    Any help appreciated,
    Peter
  • NGSfan
    Senior Member
    • Apr 2009
    • 181

    #2
    Hi ulz_peter,

    We are also interested in this question. I'm surprised others have not asked this and are only looking for SNPs/indels in exome data

    I have not found a published package yet that does this.

    Hard Cutoffs for reads/exon are not safe since you'll get too many False positives for deletions if the capture did not go well or when going from one sample to another (unless you normalize reads across samples).

    However, I am in collaboration with someone who is working on this problem using a statistical approach
    Last edited by NGSfan; 01-31-2011, 06:30 AM.

    Comment

    • Yilong Li
      Member
      • Dec 2010
      • 41

      #3
      I haven't specifically looked into this problem but I have a feeling that BEDTools would contain functions (for example genomeCoverage) for calculating what you need.

      So something like calculate base-wise genome coverage of the exome data -> filter through the capture regions by discarding bases not in those regions -> filter out bases having coverage <20x.

      Comment

      • Michael.James.Clark
        Senior Member
        • Apr 2009
        • 207

        #4
        Originally posted by ulz_peter View Post
        Is there already a published way of looking at sites which should have been captured but didn't be sequenced?
        I'm not sure about a published way. I am working on a paper that will include this type of analysis (among many other things), so I can't go into detail yet, but I can suggest some things more generally.

        I personally wrote a perl script to do this myself. It basically calculates mean coverage across each targeted interval and generates a set of "% targets above mean coverage" metrics. It will be easy to tweak this script to instead output all the intervals below a specific coverage value (whatever you think equates to "too low" for variant calling--4X coverage or whatever).

        Just for your information, I compared multiple different exome pull-down platforms and saw various amounts of un-covered targets in each (ranging from ~1%-~7% missed).

        It appeared that even with additional sequencing, there are 1-2% of the targets in each platform that simply didn't pull down adequately and thereby failed.
        Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
        Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
        Projects: U87MG whole genome sequence [Website] [Paper]

        Comment

        • swbarnes2
          Senior Member
          • May 2008
          • 910

          #5
          BEDTools does indeed have something. Below an excerpt from their histogram output. Column 5 is the coverage, and column 6 is how many bases were covered that well. Column 7 is the total length of the feature, column 8 is the % of the feature covered at that exact depth.

          So for the exon below, 8 bases were totally uncovered, and 122 bases were covered at 5x or greater.

          I made it with this command line, courtesy of another poster on Seqanswers

          coverageBed -abam reads.bam b exons.bed -hist >result.txt


          chr1 176160618 176161057 Spna1-41 0 8 439 0.0182232
          chr1 176160618 176161057 Spna1-41 1 59 439 0.1343964
          chr1 176160618 176161057 Spna1-41 2 92 439 0.2095672
          chr1 176160618 176161057 Spna1-41 3 103 439 0.2346241
          chr1 176160618 176161057 Spna1-41 4 55 439 0.1252847
          chr1 176160618 176161057 Spna1-41 5 21 439 0.0478360
          chr1 176160618 176161057 Spna1-41 6 49 439 0.1116173
          chr1 176160618 176161057 Spna1-41 7 27 439 0.0615034
          chr1 176160618 176161057 Spna1-41 8 25 439 0.0569476

          Comment

          • NGSfan
            Senior Member
            • Apr 2009
            • 181

            #6
            Originally posted by Michael.James.Clark View Post
            I'm not sure about a published way. I am working on a paper that will include this type of analysis (among many other things), so I can't go into detail yet, but I can suggest some things more generally.

            I personally wrote a perl script to do this myself. It basically calculates mean coverage across each targeted interval and generates a set of "% targets above mean coverage" metrics. It will be easy to tweak this script to instead output all the intervals below a specific coverage value (whatever you think equates to "too low" for variant calling--4X coverage or whatever).

            Just for your information, I compared multiple different exome pull-down platforms and saw various amounts of un-covered targets in each (ranging from ~1%-~7% missed).

            It appeared that even with additional sequencing, there are 1-2% of the targets in each platform that simply didn't pull down adequately and thereby failed.

            Yes, this is precisely the problem - you will always have 1-2% of target regions (eg. exons) that are not captured. If you have , say, 15000 target regions, then up to 300 regions you would false call as deletions. So you would have to have a way to figure out the "bad baits" and remove them from the analysis. The question is - across how many samples does a bait not have to work to call it bad? 5? 10 ? 20?

            This is a trickier problem in that I don't think thresholds like #reads/exon would work robustly.

            Variability in capture rates, sampling depths, etc would make it hard to use one cutoff consistently.. no?

            Just brainstorming here...
            Last edited by NGSfan; 02-03-2011, 10:16 AM.

            Comment

            • NGSfan
              Senior Member
              • Apr 2009
              • 181

              #7
              Just to give you a taste on what we have working (I cannot give the details since we have not published yet):

              We've got ~60 cell lines sequenced with targeted exonic capture.

              Our methods takes the bait locations, calculates the coverage and determines deleted regions. It does not use simple cutoffs, but looks at the distribution of normalized reads across all cell lines for each bait and determines what read count represents "copy number 2" (diploid) and what is less than (deletion) or greater than diploid (amplification or copy number). For now we can get the deletions reliably - but CNV counts are trickier. It's very preliminary though !

              So far CNV/deletion algorithms I know are only for whole genome sequencing.

              IGV tracks:

              Copy number/deletion prediction:
              Last edited by NGSfan; 02-16-2011, 09:25 AM.

              Comment

              • nilshomer
                Nils Homer
                • Nov 2008
                • 1283

                #8
                Is it the NCI 60 set?

                Comment

                • NGSfan
                  Senior Member
                  • Apr 2009
                  • 181

                  #9
                  No, it's our own sequenced exome capture from 60-80 different cell lines - it probably has considerable overlap though.

                  Although I didn't know there is exome capture data available for the NCI 60? Where would I find this?

                  Would be interesting to run it on that data.

                  Comment

                  • csoong
                    Member
                    • Jun 2009
                    • 74

                    #10
                    I don't know if others have run into problems of not being able to get a hand on commercialized exome baits bed files. I spoke with a sales rep and he told me it is proprietary info. yikes.

                    Comment

                    • swbarnes2
                      Senior Member
                      • May 2008
                      • 910

                      #11
                      Originally posted by csoong View Post
                      I don't know if others have run into problems of not being able to get a hand on commercialized exome baits bed files. I spoke with a sales rep and he told me it is proprietary info. yikes.
                      That's terrible. How can you validate if its working up to specs if you don't know what you are supposed to be capturing?

                      We used the Agilient mouse exome set, and they gave us a .bed file

                      Comment

                      • NGSfan
                        Senior Member
                        • Apr 2009
                        • 181

                        #12
                        Originally posted by csoong View Post
                        I don't know if others have run into problems of not being able to get a hand on commercialized exome baits bed files. I spoke with a sales rep and he told me it is proprietary info. yikes.
                        What company was this? Nimblegen? Illumina? Agilent?

                        That doesn't sound good - it is crucial to have the bait bed files - how else will you know what was probed and not probed?

                        Maybe the sales rep meant it was only for paying customers?

                        Comment

                        • csoong
                          Member
                          • Jun 2009
                          • 74

                          #13
                          thats good to hear, at least now they can't say they arent giving out the bed files anymore

                          im using human sureselect all exon kit 50 mb

                          Comment

                          • csoong
                            Member
                            • Jun 2009
                            • 74

                            #14
                            thanks to all you people, i finally will have my hands on the files after speaking with the company

                            Comment

                            • swbarnes2
                              Senior Member
                              • May 2008
                              • 910

                              #15
                              Hooray, happy ending.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              24 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              39 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              61 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...