Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coverage "standards" for SNP detection in tumor samples

    Dear all,

    I was wondering if there is a standard "coverage" for exomic SNP calling in tumor_vs_healthy samples (same patient). As we know, tumor samples have an intrinsically higher mutability (Parsons et al., 1993). I was thinking of applying a threshold of at least 20X for the healthy one, and 50X for the tumor one. Do these look sufficient to you?

    Also, there appears to be no standard for coverage definition: so by "50X" I intend exome-wise coverage of 100bp uniquely-mapping Illumina paired reads, after duplicate removal.

    Thanks!

    Federico

  • #2
    Originally posted by giorgifm View Post
    Dear all,

    I was wondering if there is a standard "coverage" for exomic SNP calling in tumor_vs_healthy samples (same patient). As we know, tumor samples have an intrinsically higher mutability (Parsons et al., 1993). I was thinking of applying a threshold of at least 20X for the healthy one, and 50X for the tumor one. Do these look sufficient to you?

    Also, there appears to be no standard for coverage definition: so by "50X" I intend exome-wise coverage of 100bp uniquely-mapping Illumina paired reads, after duplicate removal.

    Thanks!

    Federico
    No there is no standard. It depends how many calls you want to make accurately. Something like SomaticSniper will happily call things in low coverage areas, but you will have little confidence in the genotypes. Even with 40x coverage for an exome sample.

    I'm doing some development work on cancer panels, and we've been advised (this is not exome sequencing, but targetted resequencing) to be aiming for 500x to 1000x coverage. I was a little iffy about these figures until I started actually doing the analysis on exomes myself just to test things out.

    This is prohibitively expensive for exomes I imagine, so I think in terms of depth 'as much as you can afford'. Remember you will also want to be confident about the genotype calls in your normal samples..

    Comment


    • #3
      Thank you for your answer Bukowski. So far we are aiming at around 40x coverage. That seems to be the minimum coverage to stabilize the significance of somatic mutations found.

      Comment


      • #4
        finding the depth of coverage with more confidence

        This is an extension to the original question on this post. I was wondering if anybody knows how I can calculate the accuracy in sequencing at various levels of depth of coverage. based on this I want to choose the coverage with more confidence. thanks in advance to all.

        Comment


        • #5
          Originally posted by rama View Post
          This is an extension to the original question on this post. I was wondering if anybody knows how I can calculate the accuracy in sequencing at various levels of depth of coverage. based on this I want to choose the coverage with more confidence. thanks in advance to all.
          A couple ideas: http://genome.sph.umich.edu/wiki/SNP...Set_Properties

          Also, for any metric, you can tentatively assume your higher coverage/higher quality score calls will be more "correct" than the lower coverage/lower quality score calls. Thus, for any metric, compare different coverage thresholds to your highest quality sets. One caveat is it's possible for mapping artifacts or other things to lead to super high coverage, so make sure your "high quality set" looks real.

          Comment


          • #6
            Thanks a bunch for the pointer.
            once we identify the data with "high quality set" is there a way to compute metrics at different coverage thresholds. I am not sure how to do it, do I have to randomly subset sequence reads and check for the variant calls or just compare with the consensus?

            Comment


            • #7
              Global Alliance White Paper on Clinical Data

              There is a consortium on clinical data as described in the White Paper linked here:



              On page 30 there is listed the names of organizers and their institutions, where you may be able to obtain additional follow-up information to "standards" questions about clinical data at this time.

              Please contribute your posts on any standards statements that you may obtain therefrom here at this forum and/or in the Wiki so that others may be kept informed thus enabling a more rapid dissemination of consensus parameters.

              Comment


              • #8
                Originally posted by rama View Post
                Thanks a bunch for the pointer.
                once we identify the data with "high quality set" is there a way to compute metrics at different coverage thresholds. I am not sure how to do it, do I have to randomly subset sequence reads and check for the variant calls or just compare with the consensus?
                I was thinking just separate calls by coverage. IE, make a set of calls at >100x coverage, a set at 90-100x, a set at 80-90x, etc, and compare them. Or use quality score instead of coverage if you like that metric better. Your idea is interesting though; you could take a set of high quality calls and then randomly take smaller and smaller sets of reads for the same positions, redo the calling, and see how low the coverage threshold can get until your "subset calls" deviate too much from the legitimate set. The problem is if your high quality calls are in "easy" sites then this strategy won't apply to the rest of the genome necessarily.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Exploring the Dynamics of the Tumor Microenvironment
                  by seqadmin




                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                  07-08-2024, 03:19 PM
                • seqadmin
                  Exploring Human Diversity Through Large-Scale Omics
                  by seqadmin


                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                  06-25-2024, 06:43 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 07-19-2024, 07:20 AM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 07-16-2024, 05:49 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 07-15-2024, 06:53 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 07-10-2024, 07:30 AM
                0 responses
                43 views
                0 likes
                Last Post seqadmin  
                Working...
                X