Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exome Analysis.. Annotation - unusual observation; need explanation..

    I have a bioinformatics query on the exome project we are running. We are using a NimbleGenV2 exome capture kit for target capture.

    It's a unusual sort of question, and has been nagging me for more than a week now and nobody could provide a good answer yet:

    Lets say I have processed raw reads from a tumor-normal paired exome experiment and made them fit for mutation calling. I have two bam files (one each for tumor and normal) that I feed into a mutation caller and since its an exome experiment,

    Case 1: I limit the variant calls to mutations limited to the target regions only by using the .bed file from the NimbleGen website, as an interval parameter.

    Now, theoretically all the mutation calls made by the caller are exonic or splicing. I have 2100 SNVs.

    I run these calls through an annotation software and annotate it against a refgene set (Annovar (uses directly downloaded UCSC refgene set), more than 92% of the SNVs are annotated as "exonic" or "splicing" as expected..


    Case 2: I limit the variant calls to mutations limited to exons + 10 bases only by generating a .bed file of refgenes from the UCSC table browser, and use it as an interval parameter.

    Now, once again theoretically all the mutation calls made by the caller are exonic or splicing. I have 2700 SNVs.

    But when I run these calls through an annotation software and annotate it against a refgene set (Annovar again), only approximately 65%-75% of the calls are exonic or splicing. The rest are annotated as intronic, upstream, downstream and a zillion other things..

    (1) My understanding is that the 2100 vs 2700 are because of possible misalignment of a fraction of the reads into non target regions and hence the extra 600 SNVs comprise false positive mutation calls, for the most part (correct me if I am wrong).
    (2) The 92% vs 65-75% on the other hand is quite inexplicable. In both cases the caller was asked to call variants in only exonic regions; which in the former case was the capture target regions, and in the latter case was the refgene set of exons got from the Table Browser. I would have expected >90% exonic variants in Case 2 also..


    Have you noticed this before? Is there an explanation as to why (2) is happening?

  • #2
    Hi shyam_la,

    1) Try to compare the two bed files (nimblegen and refgenes) to how different they are.
    2) It does not seem too much to extend 10bp, but a big chunk of human exons are <200bp so the chance of getting non-exonic/splicing variants is quite big.
    3) If you are curious, try the nimblegen bed file but extending 10bp; and try the refgenes without extending 10bp. I am quite interested in what you get.

    Best regards,
    Douglas

    Comment


    • #3
      yeah the bed files can vary... which will ultimately effect the statistics, one more thing i want to ask is 2100 included in the 2700 you get in case 2 ??

      Comment


      • #4
        Yes, the 2100 are included in the 2700. Of course the bed files vary - but that is not an explanation for my observation..

        Originally posted by ersgupta View Post
        yeah the bed files can vary... which will ultimately effect the statistics, one more thing i want to ask is 2100 included in the 2700 you get in case 2 ??

        Comment


        • #5
          HI,

          1) On IGV, they are not very different at the genomic level.. If I zoom in to look at finer details, the NimbleGen one has a lot of exons missing that are present in the refseq one (which is expected)..
          I will try out mutation calling without the +10 bp - though doubt thats going to reduce the numbers very much..
          Will update with results.

          Originally posted by DZhang View Post
          Hi shyam_la,

          1) Try to compare the two bed files (nimblegen and refgenes) to how different they are.
          2) It does not seem too much to extend 10bp, but a big chunk of human exons are <200bp so the chance of getting non-exonic/splicing variants is quite big.
          3) If you are curious, try the nimblegen bed file but extending 10bp; and try the refgenes without extending 10bp. I am quite interested in what you get.

          Best regards,
          Douglas
          www.contigexpress.com

          Comment


          • #6
            Did it on 1 sample..
            Got 2492 SNVs (exons only) vs 2768 (exons + 10 bp).
            78% of those were annotated as exonic/splicing vs 70% (exon + 10bp)..

            So, 8% of the difference is due to the extra 10bp that I had used. But 78% is still a low proportion.. Expected: atleast 90%

            Originally posted by DZhang View Post
            Hi shyam_la,

            1) Try to compare the two bed files (nimblegen and refgenes) to how different they are.
            2) It does not seem too much to extend 10bp, but a big chunk of human exons are <200bp so the chance of getting non-exonic/splicing variants is quite big.
            3) If you are curious, try the nimblegen bed file but extending 10bp; and try the refgenes without extending 10bp. I am quite interested in what you get.

            Best regards,
            Douglas
            www.contigexpress.com

            Comment


            • #7
              2100x.92=1932
              2492X.78=1944

              So the absolute exonic/splicing numbers are quite close. Without examining carefully the difference in the two bed files and the actual SNV variants unique to the refgene bed file, it is hard to explain why.

              Douglas

              Comment


              • #8
                Yeah, exactly my thoughts..
                Thank you.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Best Practices for Single-Cell Sequencing Analysis
                  by seqadmin



                  While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                  06-06-2024, 07:15 AM
                • seqadmin
                  Latest Developments in Precision Medicine
                  by seqadmin



                  Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                  Somatic Genomics
                  “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                  05-24-2024, 01:16 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 07:23 AM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-17-2024, 06:54 AM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-14-2024, 07:24 AM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-13-2024, 08:58 AM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Working...
                X