Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coverage "standards" for SNP detection in tumor samples

    Dear all,

    I was wondering if there is a standard "coverage" for exomic SNP calling in tumor_vs_healthy samples (same patient). As we know, tumor samples have an intrinsically higher mutability (Parsons et al., 1993). I was thinking of applying a threshold of at least 20X for the healthy one, and 50X for the tumor one. Do these look sufficient to you?

    Also, there appears to be no standard for coverage definition: so by "50X" I intend exome-wise coverage of 100bp uniquely-mapping Illumina paired reads, after duplicate removal.

    Thanks!

    Federico

  • #2
    Originally posted by giorgifm View Post
    Dear all,

    I was wondering if there is a standard "coverage" for exomic SNP calling in tumor_vs_healthy samples (same patient). As we know, tumor samples have an intrinsically higher mutability (Parsons et al., 1993). I was thinking of applying a threshold of at least 20X for the healthy one, and 50X for the tumor one. Do these look sufficient to you?

    Also, there appears to be no standard for coverage definition: so by "50X" I intend exome-wise coverage of 100bp uniquely-mapping Illumina paired reads, after duplicate removal.

    Thanks!

    Federico
    No there is no standard. It depends how many calls you want to make accurately. Something like SomaticSniper will happily call things in low coverage areas, but you will have little confidence in the genotypes. Even with 40x coverage for an exome sample.

    I'm doing some development work on cancer panels, and we've been advised (this is not exome sequencing, but targetted resequencing) to be aiming for 500x to 1000x coverage. I was a little iffy about these figures until I started actually doing the analysis on exomes myself just to test things out.

    This is prohibitively expensive for exomes I imagine, so I think in terms of depth 'as much as you can afford'. Remember you will also want to be confident about the genotype calls in your normal samples..

    Comment


    • #3
      Thank you for your answer Bukowski. So far we are aiming at around 40x coverage. That seems to be the minimum coverage to stabilize the significance of somatic mutations found.

      Comment


      • #4
        finding the depth of coverage with more confidence

        This is an extension to the original question on this post. I was wondering if anybody knows how I can calculate the accuracy in sequencing at various levels of depth of coverage. based on this I want to choose the coverage with more confidence. thanks in advance to all.

        Comment


        • #5
          Originally posted by rama View Post
          This is an extension to the original question on this post. I was wondering if anybody knows how I can calculate the accuracy in sequencing at various levels of depth of coverage. based on this I want to choose the coverage with more confidence. thanks in advance to all.
          A couple ideas: http://genome.sph.umich.edu/wiki/SNP...Set_Properties

          Also, for any metric, you can tentatively assume your higher coverage/higher quality score calls will be more "correct" than the lower coverage/lower quality score calls. Thus, for any metric, compare different coverage thresholds to your highest quality sets. One caveat is it's possible for mapping artifacts or other things to lead to super high coverage, so make sure your "high quality set" looks real.

          Comment


          • #6
            Thanks a bunch for the pointer.
            once we identify the data with "high quality set" is there a way to compute metrics at different coverage thresholds. I am not sure how to do it, do I have to randomly subset sequence reads and check for the variant calls or just compare with the consensus?

            Comment


            • #7
              Global Alliance White Paper on Clinical Data

              There is a consortium on clinical data as described in the White Paper linked here:



              On page 30 there is listed the names of organizers and their institutions, where you may be able to obtain additional follow-up information to "standards" questions about clinical data at this time.

              Please contribute your posts on any standards statements that you may obtain therefrom here at this forum and/or in the Wiki so that others may be kept informed thus enabling a more rapid dissemination of consensus parameters.

              Comment


              • #8
                Originally posted by rama View Post
                Thanks a bunch for the pointer.
                once we identify the data with "high quality set" is there a way to compute metrics at different coverage thresholds. I am not sure how to do it, do I have to randomly subset sequence reads and check for the variant calls or just compare with the consensus?
                I was thinking just separate calls by coverage. IE, make a set of calls at >100x coverage, a set at 90-100x, a set at 80-90x, etc, and compare them. Or use quality score instead of coverage if you like that metric better. Your idea is interesting though; you could take a set of high quality calls and then randomly take smaller and smaller sets of reads for the same positions, redo the calling, and see how low the coverage threshold can get until your "subset calls" deviate too much from the legitimate set. The problem is if your high quality calls are in "easy" sites then this strategy won't apply to the rest of the genome necessarily.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Best Practices for Single-Cell Sequencing Analysis
                  by seqadmin



                  While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                  06-06-2024, 07:15 AM
                • seqadmin
                  Latest Developments in Precision Medicine
                  by seqadmin



                  Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                  Somatic Genomics
                  “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                  05-24-2024, 01:16 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 08:58 AM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-12-2024, 02:20 PM
                0 responses
                15 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-07-2024, 06:58 AM
                0 responses
                182 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-06-2024, 08:18 AM
                0 responses
                231 views
                0 likes
                Last Post seqadmin  
                Working...
                X