Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • willishf
    Junior Member
    • Feb 2013
    • 1

    Homozygous calling from Exome DNA-seq

    Working with matched normal tumor samples from TCGA breast to determine percentage of samples with a germline homozygous deletion.

    From 1000 genome project and other publications there are known genes that have a homozygous deletion in germline.

    For example the particular gene I am looking at has been published based on PCR to be homozygous deleted in a large percentage of caucasians. The deletion based on 1000 genome data is very precise removal of the gene with minimal impact to neighboring genes and is called a homozygous deletion in a large percentage of the 1000 genome samples. The view is that this deletion created an advantage at some point in our ancestry.

    I am trying to determine if this deletion is protective of cancer where using matched normal tumor TCGA breast data I want to find the percentage of samples that have the homozygous deletion.

    Using samtools I did a sequence read count for the gene of interest as well as very close neighboring genes. This particular region has a high number of genes. Using what is known about this particular germline deletion from 1000 genome you would expect that if a sample has a homozygous deletion of the gene when pulling reads from this region you would have some number of samples with 0 reads. If I did get X number of samples with 0 reads then that would indicate the deletion of this gene is not protective of cancer if the percentage of samples with the deletion matched what is expected by chance.

    Here is the problem that I need help on ways to continue to challenge the findings. Of 60 matched normal tumor samples analyzed so far they all contain reads from the gene. This gives strong support for the hypothesis that those who are missing both copies of this gene will not get cancer. Bold statement that needs challenging.

    Using a neighboring gene that is roughly the same size and has similar exon/intron patterns as the control I normalize the read count returned by sam by the read count of the neighboring gene. Need to do a more formal RPKM number and filtering on phrep score but quick comparison is that 31 of the samples have a sequence read ratio of 20%, 7+ samples at 50% and 20+ samples > 80% compared to the neighboring gene. Tempting to call the 20+% group samples as a heterozygous germline deletion but would feel much better if it was a 50% ratio. I suspect that doing RPKM will raise the percentage closer to the expected 50%. Average # of reads across all samples for the region that is known to be deleted is 5800. For the 20% ratio group the Average # of reads in the region known to be deleted is 2000 and the minimum is 658. For the neighboring gene used to normalize the percentage the average number of reads is 12,000.

    Is it reasonable to assume that a homozygous deletion in germline should result in 0 sequences read for that region?

    Contamination is a issue but not expecting that it would be 20% in almost half the samples.

    The reads could be originating from a pseudogene/or other gene with sequence homology and the exon capture library is not precise enough.

    I took a couple of the reads from the deleted region of interest and did a blast search and they hit the expected gene 100%. This tells me the coordinates are correct.

    The mapping quality for some of the reads are not very good but they have phred scores of 30+ so given they actually map to a known sequence indicates the reads are probably valid.

    The BAM files were mapped by TCGA.

    Looking at the region with IGV the reads have good exon distribution with peaks in the middle of the exon.

    Welcome any feedback or advice on what else I can do to validate that having X number of sequences in a region means that the region is not a homozygous deletion.

    If you have a specific area of expertise in this area and can contribute to the data analysis looking for co-authors.
  • AJERYC
    Member
    • Jan 2012
    • 26

    #2
    The best way to confirm your hypothesis would be to have several tumor samples were homozygous or heterozygous that you can use as controls in your experiments. This could be possible since the deletion can reduces cancer risk but it can't be 100% reduction. Your data point that the region is not deleted but you can't be sure because there can be sequences misaligned. For example, if you look for chromosome Y genes in female exome you always find aligned reads there, but they are usually in lower coverage counts than in male.

    Comment

    Latest Articles

    Collapse

    • SEQadmin2
      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
      by SEQadmin2


      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
      ...
      06-02-2026, 10:05 AM
    • SEQadmin2
      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
      by SEQadmin2


      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


      Introduction

      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
      05-22-2026, 06:42 AM
    • SEQadmin2
      Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
      by SEQadmin2

      Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


      Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
      05-06-2026, 09:04 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, 06-02-2026, 12:03 PM
    0 responses
    19 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 11:40 AM
    0 responses
    14 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 05-28-2026, 11:40 AM
    0 responses
    29 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 05-26-2026, 10:12 AM
    0 responses
    31 views
    0 reactions
    Last Post SEQadmin2  
    Working...