Homozygous calling from Exome DNA-seq

willishf

Junior Member

Join Date: Feb 2013

Posts: 1
- Share
- Tweet
#1

Homozygous calling from Exome DNA-seq

02-27-2013, 08:24 AM

Working with matched normal tumor samples from TCGA breast to determine percentage of samples with a germline homozygous deletion.

From 1000 genome project and other publications there are known genes that have a homozygous deletion in germline.

For example the particular gene I am looking at has been published based on PCR to be homozygous deleted in a large percentage of caucasians. The deletion based on 1000 genome data is very precise removal of the gene with minimal impact to neighboring genes and is called a homozygous deletion in a large percentage of the 1000 genome samples. The view is that this deletion created an advantage at some point in our ancestry.

I am trying to determine if this deletion is protective of cancer where using matched normal tumor TCGA breast data I want to find the percentage of samples that have the homozygous deletion.

Using samtools I did a sequence read count for the gene of interest as well as very close neighboring genes. This particular region has a high number of genes. Using what is known about this particular germline deletion from 1000 genome you would expect that if a sample has a homozygous deletion of the gene when pulling reads from this region you would have some number of samples with 0 reads. If I did get X number of samples with 0 reads then that would indicate the deletion of this gene is not protective of cancer if the percentage of samples with the deletion matched what is expected by chance.

Here is the problem that I need help on ways to continue to challenge the findings. Of 60 matched normal tumor samples analyzed so far they all contain reads from the gene. This gives strong support for the hypothesis that those who are missing both copies of this gene will not get cancer. Bold statement that needs challenging.

Using a neighboring gene that is roughly the same size and has similar exon/intron patterns as the control I normalize the read count returned by sam by the read count of the neighboring gene. Need to do a more formal RPKM number and filtering on phrep score but quick comparison is that 31 of the samples have a sequence read ratio of 20%, 7+ samples at 50% and 20+ samples > 80% compared to the neighboring gene. Tempting to call the 20+% group samples as a heterozygous germline deletion but would feel much better if it was a 50% ratio. I suspect that doing RPKM will raise the percentage closer to the expected 50%. Average # of reads across all samples for the region that is known to be deleted is 5800. For the 20% ratio group the Average # of reads in the region known to be deleted is 2000 and the minimum is 658. For the neighboring gene used to normalize the percentage the average number of reads is 12,000.

Is it reasonable to assume that a homozygous deletion in germline should result in 0 sequences read for that region?

Contamination is a issue but not expecting that it would be 20% in almost half the samples.

The reads could be originating from a pseudogene/or other gene with sequence homology and the exon capture library is not precise enough.

I took a couple of the reads from the deleted region of interest and did a blast search and they hit the expected gene 100%. This tells me the coordinates are correct.

The mapping quality for some of the reads are not very good but they have phred scores of 30+ so given they actually map to a known sequence indicates the reads are probably valid.

The BAM files were mapped by TCGA.

Looking at the region with IGV the reads have good exon distribution with peaks in the middle of the exon.

Welcome any feedback or advice on what else I can do to validate that having X number of sequences in a region means that the region is not a homozygous deletion.

If you have a specific area of expertise in this area and can contribute to the data analysis looking for co-authors.
Tags: None
AJERYC

Member

Join Date: Jan 2012

Posts: 26
- Share
- Tweet
#2

03-10-2013, 12:43 AM

The best way to confirm your hypothesis would be to have several tumor samples were homozygous or heterozygous that you can use as controls in your experiments. This could be possible since the deletion can reduces cancer risk but it can't be 100% reduction. Your data point that the region is not deleted but you can't be sure because there can be sequences misaligned. For example, if you look for chromosome Y genes in female exome you always find aligned reads there, but they are usually in lower coverage counts than in male.
Comment

Previous template Next

Cancer Drug Resistance: The Lingering Barrier to Rising Survival

by SEQadmin2

Cancer survival rates have significantly increased in the last few decades in the United States, reaching a combined 70% 5-year survival rate by 2021. Behind this number, there are years of research to find new therapies, drug targets, and early detection methods. But there is one core challenge that keeps slowing down these advances, and it’s about drug resistance.

There is no single reason why many patients don’t respond to treatment as expected. Cancer is...
- Channel: Articles
Today, 05:17 AM
Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing

by GATTACAT

Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
- Channel: Articles
07-01-2026, 11:43 AM
Nine Things a Sample Prep Scientist Thinks About Before Sequencing

by SEQadmin2

I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

Here are nine questions we think about, in roughly the order they matter, before...
- Channel: Articles
06-18-2026, 07:11 AM

Topics	Statistics	Last Post
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, Yesterday, 11:05 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:05 AM
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM

Unconfigured Ad

Homozygous calling from Exome DNA-seq

Comment

Latest Articles

ad_right_rmr

News