Originally posted by upenn_ngs
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by Xi Wang View PostOh. But if you have the data, you can try what just I mentioned.
And for PE reads, I don't think it can improve a lot. Because it is the DNA fragments that amplified. So the coverage should have some relationship with the GC-content of the DNA fragments. On the other hand, the read GC-content and the DNA fragment GC-content have a high correlation. As a result, the relationship between the read GC-content and the coverage reflects a lot the reality.
Maybe I'm not understanding the advantage. Could you show me an example?
Comment
-
Originally posted by bioinfosm View Post...
We notice that only 20% of reads map on-target! Is that a common thing? (Illumina 75bp PE)
we have also run a TE using SureSelect with Illumina 76bp reads Single End and we obtained similar results to the Tewhey et al (Genome Biol 2009): 50% of uniquely aligned reads were on target with a uniformity of capture similar to what reported in the paper.
I am wondering if someone else has results on Illumina 76 Paired-End, as it seems from Agilent website that the % on target should increase from 50% to 70% using PE protocol.
Thanks
Comment
-
Originally posted by bioinfosm View PostAnother point which I did not notice here is, # of reads actually sequenced to get 30x exome coverage for the agilent capture stuff.
We notice that only 20% of reads map on-target! Is that a common thing? (Illumina 75bp PE)
In our case with 76-bp single end reads, we targeted 0.09% of the genome and enriched to 35% of the sequences being "on target" , which is a ~390-fold enrichment. If you were to actually convert Tewhey's numbers to solely "on target" (from 0.12% to 37%), then their claim of "about 400 fold enrichment" is actually ~290-fold. Just a small criticism.
We have just completed a 76-bp paired end run with 4 samples multiplexed - I will let you know what we get with our alignment resultsLast edited by NGSfan; 03-17-2010, 02:45 AM.
Comment
-
Originally posted by NGSfan View PostI'm not clear on why converting the read coverage to a log scale would help understand distribution better. Simply visualizing coverage on a log scale will simply change the scale you're looking at, no?
Maybe I'm not understanding the advantage. Could you show me an example?
But I noticed that the regions with high GC-content are less than the regions with average GC, as well as the low GC regions. That is to say there are more points in the figure around the average GC (x-axis). So it is more likely to have high coverage points in this part. This is what you saw in the figure. If you take log, the high-coverage points will decrease more than low-coverage do. This figure is more promising to reflect the nature of the relationship between read coverage and GC-content.
Comment
-
If you want to be able to enrich repetitive regions without bias, look at RainDance Technologies. Using their RainStorm approach, you can design PCR primers to capture 99% or greater of your target regions. The technology will also provide better uniformity allowing for less sequencing than SureSelect.
Comment
-
Hi Bryan,
Thanks for the suggestion. The Rain Dance approach seems to really be the better approach to handle these biased regions, albeit on a smaller scale.
For example, it doesn't scale very well for say, 1000 genes like the SureSelect, or if you want the whole exome, for example.
Comment
-
Originally posted by bryan haffer View PostIf you want to be able to enrich repetitive regions without bias, look at RainDance Technologies. Using their RainStorm approach, you can design PCR primers to capture 99% or greater of your target regions. The technology will also provide better uniformity allowing for less sequencing than SureSelect.
www.raindancetech.com
Comment
-
NGSfan,
Our lab has been using a custom SureSelect library to capture and sequence the extended HLA region (~8Mb). We have found that we get great coverage (>40X) over regions with <60% GC content while we get very poor coverage of regions with >60% GC content. I don't think that these results are out of the ordinary. I just came across the manuscript below...
Despite the ever-increasing output of Illumina sequencing data, loci with extreme base compositions are often under-represented or absent. To evaluate sources of base-composition bias, we traced genomic sequences ranging from 6% to 90% GC through the process by quantitative PCR. We identified PCR du …
They have shown that the PCR steps in the library constuction process create a huge bias against regions of high GC content. They have also shown how to resolve this problem. Check it out...
DoubleALast edited by DoubleA; 03-23-2011, 10:08 AM.
Comment
-
DoubleA,
Thank you for your link, the manuscript is very interesting but it just refers to the Illumina Sureselect protocol.
Has somebody some experience for the Agilent sureselect protocol ? in particular for the enzyme Herculase II...Does this enzyme allow to restore the fragments with high GC percent (like the AccuPrime Taq HiFi used in the manuscript) ?
Sam64
Comment
-
Hello everyone,
I recently tried three different polymerases as well as different PCR conditions (increases in denaturation time) in an attempt to increase read coverage in regions with >60% GC content. I enriched 12 Illumina PE libraries with either AccuPrime, Phusion, or KAPA polymerase (4 with each polymerase). I should mention that I ligated the same adapter and incorporated a unique bar code for each library using the three primer enrichment approach (PE 1.0, PE 2.0, and a primer containing the bar code). Following library production, I pooled 4 libraries (all 4 created with the same polymerase) and performed a hybridization with a custom SureSelect bait library covering ~8Mb of the HLA region on human chromosome 6 (baits on ~3.8 Mb). Following hybridization and elution, I performed a final enrichment with primers covering the last 20bp of the 5' and 3' end of the Illumina libraries. All 3 pools (12 libraries) were mixed and run on a HiSeq lane for a single 40bp run. The coverage was of the target region was a bit low (10-50X) so we'll probably sequence these libraries in the future with a PE X 100bp run. I have attached graphs of read coverage vs %GC content (20bp window). As you can see, the coverage of the the GC rich regions is pretty similar for each polymerase and the increase in denaturation time per cycle did not help much either. Below is the % reads mapping to regions with GC content >60%. I thought some of you might be interested in these results.
Regards,
Double A
AccuPrime 15 second/cycle denaturation: 12.4%
AccuPrime 30 second/cycle denaturation: 12.9%
AccuPrime 45 second/cycle denaturation: 13.2%
AccuPrime 60 second/cycle denaturation: 14.4%
Phusion 15 second denaturation/cycle: 13.5%
Phusion 30 second denaturation/cycle: 16.4%
Phusion 45 second denaturation/cycle: 13.1%
Phusion 60 second denaturation/cycle: 12.6%
KAPA 15 second denaturation/cycle: 13.9%
KAPA 15 second denaturation/cycle: 10.7%
KAPA 15 second denaturation/cycle: 13.5%
KAPA 15 second denaturation/cycle: 11.6%
Comment
-
coverage calc
Hey
I am trying to sequence the exome and the capture kit is 100MB
The sequencing core promised 120 million reads per lane and we are using paired end 100bp reads and our fragment size is 250 basepairs.
My calculation was I will get 120 million reads * 200= 240 million bases read
so coverage= 240 million bases/100MB= 240x coverage (average)
But some people say I will get a coverage of only 120x. What could be the reason? Or is the coverage actually 240x?
Comment
-
Hi Arvi8689,
There are two things that will reduce your fold coverage with a exome capture experiment. First, at least 10% of your reads will be PCR duplicates and should be removed before alignment. Second, ~60-80% of your unique reads will be "on target". Therefore, it's likely that only 50% of your initial reads will be unique and map to your target.
Double A
Comment
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:45 AM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
Today, 07:45 AM
|
||
Started by seqadmin, Yesterday, 07:59 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Yesterday, 07:59 AM
|
||
Newborn Genomic Screening Shows Promise in Reducing Infant Mortality and Hospitalization
by seqadmin
Started by seqadmin, 12-09-2024, 08:22 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
12-09-2024, 08:22 AM
|
||
Started by seqadmin, 12-02-2024, 09:29 AM
|
0 responses
174 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:29 AM
|
Comment