Announcement
Collapse
No announcement yet.
X
-
Nice helpful answers I was also about to ask the similar question and found it already here.
-
Here's a nice uncomplicated summary of hg19... https://grch37.ensembl.org/Homo_sapiens/Info/Annotation
Leave a comment:
-
Sorry to revive this thread, but exons or CDSs are not the only thing transcribed, so how can we account for non-coding RNAs? My understanding is that they are also part of the transcriptome...
Leave a comment:
-
Originally posted by ulz_peter View PostJust to throw my 2 Cents in. As far as I know most exome-enriching kits use the CDS database for generating the exome library. As this database is less comprehensive than the Refseq or knownGene annotations in UCSC some exons will be missed due to that. Of course others are discarded because of hybridization difficulties (repetitive regions, etc).
Leave a comment:
-
Originally posted by ssully View PostI keep seeing a figure of 30-33Mb for the human exome e.g.
This 2009 Nature paper
"Protein-coding regions constitute ~1% of the human genome or ~30 megabases (Mb), split across ~180,000 exons."
30-33Mb is also the figure cited in Illumina's "Sequencing Output Calculator' , sent to me by tech support.
Anyone know why the number is so much higher on this thread?
I think it is important to know what we are talking about:
1. number of genomic positions that are annotated as coding (included in CDS)
2. number of genomic positions that are annotated as exonic (included in exons)
As frozenlyse and Richard Finney indicated, values for 2. range around 60 and 80Mb, depending on the annotation source.
Ssully, the citation you mention with the number of 30Mb refers to 1. ("protein coding regions").
Rstarke, what is this number of 1 billion referring to? "Annotated bases" can be anything, on a genome you can annotate introns, promoters, repeated regions.. a link to this information would help.
Now, is there a precise definition of "exome" or is it a loose term? Is it supposed to include coding regions only, or can anyone put in there some UTR, promoters, intronic flanks, etc?
Leave a comment:
-
Just to throw my 2 Cents in. As far as I know most exome-enriching kits use the CDS database for generating the exome library. As this database is less comprehensive than the Refseq or knownGene annotations in UCSC some exons will be missed due to that. Of course others are discarded because of hybridization difficulties (repetitive regions, etc).
Leave a comment:
-
Our friend Mr. Ref Seq says ...
Back of the envelope calculations:
The sum of the values for base coverage of the exons for the data above in the hg19/UCSCknown table (posted above) is
81,105,734
The Refseq table from UCSC for hg19 (jan 2011 version) says : 63,995,498
[ method : load table into datastruct, sort by name, traverse, if (currentname==previousname) dont count else calculate sum of exons and add to sum]. Notabene: this won't eliminate some overlapping situations.
Refseq is more conservative than UCSCknown and relies more on hand curation and less on computation.
I don't know about GENCODE but if it's that for human only and that number is right then it's probably any transcript ever measured. I could only speculate on what that extra bonus coverage is. A free trip to Sweden goes to the guy that can explain and prove it (if it's functionally real).
Leave a comment:
-
I would also like to know why the huge discrepancy between what's in the literature (~30-40Mb) and the numbers cited in this thread. I just checked the GENCODE v6 annotations and the total annotated base count is over a billion, supporting the estimates in this thread. I'm confused. Can anyone clear up the discrepancy?
Leave a comment:
-
I keep seeing a figure of 30-33Mb for the human exome e.g.
This 2009 Nature paper
"Protein-coding regions constitute ~1% of the human genome or ~30 megabases (Mb), split across ~180,000 exons."
30-33Mb is also the figure cited in Illumina's "Sequencing Output Calculator' , sent to me by tech support.
Anyone know why the number is so much higher on this thread?
Leave a comment:
-
By comparing the genes listed in the bed file to the UCSC annotation. I tried attaching the bed file but it's too large for this site to allow it.
Leave a comment:
-
Originally posted by NextGenSeq View PostI assume you are interested in this since you are doing whole exome sequence enrichment and subsequent sequencing.
Different vendors have different amounts of "whole exome" coverage. We found that the Agilent Sure Select only enriches for ~89% of the human whole exome.
Thanks,
sm
Leave a comment:
-
I assume you are interested in this since you are doing whole exome sequence enrichment and subsequent sequencing.
Different vendors have different amounts of "whole exome" coverage. We found that the Agilent Sure Select only enriches for ~89% of the human whole exome.
Leave a comment:
-
Thanks Guys. I understand that it is acceptable to remove redundancy at exon level.
@frozenlyse : your end number (exons) seems to match mine.
How do I deal with gene level coverage. There are many genes which overlap each other and as noted in my first post.
Total # bases in RefSeq Genes : 2,011,862,672
Is it acceptable to remove redundancy while counting bases in all human genes. In a way this will lead us to underestimate coverage. I say so because overlapping genes can be coexpressed right >>?
Thanks for your time to help me understand this.
Best,
-Abhi
Leave a comment:
-
If you just want a base pair count for different annotations, you can just use UCSC table browser, choose the genome build you are using and annotation you are interested in, and press "summary/statistics" at the bottom, eg for hg18 RefSeq you get
item count 34,702
item bases 1,166,592,699 (40.49%)
item total 2,020,112,601 (70.11%)
smallest item 33
average item 58,213
biggest item 2,304,634
block count 347,347
block bases 66,601,430 (2.31%)
block total 104,526,351 (3.63%)
smallest block 3
average block 301
biggest block 59,461
The "block" lines are what you are interested in: 347,347 exons from 34,702 Refseq genes, with total size of 104MB, however when removing redundancies 66Mb is covered
Leave a comment:
-
Originally posted by apratap View PostClearly there are overlapping regions in each of these annotation files [...] Just wondering if I should count the bases common to two genes twice or only uniq regions should be counted.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...-
Channel: Articles
09-26-2023, 06:26 AM -
-
by seqadmin
Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...-
Channel: Articles
09-07-2023, 11:15 PM -
-
by seqadmin
Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.
Whole Transcriptome RNA-seq
Whole transcriptome sequencing...-
Channel: Articles
08-31-2023, 11:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:57 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:57 AM
|
||
Started by seqadmin, 09-26-2023, 07:53 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
09-26-2023, 07:53 AM
|
||
Multiplexed Biomarker Detection with Nanopore Technology: A Leap in Precision Diagnostics
by seqadmin
Started by seqadmin, 09-25-2023, 07:42 AM
|
0 responses
15 views
0 likes
|
Last Post
by seqadmin
09-25-2023, 07:42 AM
|
||
Started by seqadmin, 09-22-2023, 09:05 AM
|
0 responses
45 views
0 likes
|
Last Post
by seqadmin
09-22-2023, 09:05 AM
|
Leave a comment: