Seqanswers Leaderboard Ad

**Xi Wang** · 01-14-2010, 10:14 PM

I didn't understand well what is the definition of base-calling duplicate. The duplicate filtering for panda genome assembly would not loss much information, but may loss the coverage. I guess it's why the coverage for high GC-content regions (>60%) become relatively lower. (It is known that high GC regions have higher sequecning coverage due to PCR.) If it is a RNA-seq study, this filtering may cause the under-estimating gene expression.

**mattanswers** · 01-15-2010, 10:50 AM

Hi Xi,

Would you know of a reference for the high GC regions have higher sequnceing coverage due to PCR ?

**zchou** · 01-15-2010, 11:40 AM

First of all, we need to know meaning of base-calling duplicate. Anyone have some ideas?

**Chipper** · 01-15-2010, 11:56 AM

" Base-calling duplicate .... The higher the raw
cluster density, the more severe this problem is."

Must be that the softwares call two reads from the same cluster. Look for near identical coordinates for identical reads then.

Very GC-rich sequences have lower coverage actually, if you can't amplify it well you can't sequence it (unless you do single molecule sequencing of course)

**Xi Wang** · 01-16-2010, 04:48 AM

Originally posted by mattanswers View Post

Hi Xi,

Would you know of a reference for the high GC regions have higher sequnceing coverage due to PCR ?

It's an "old" paper.

Code:

Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in
ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids
Res. 2008 Sep;36(16):e105. Epub 2008 Jul 26. PubMed PMID: 18660515; PubMed
Central PMCID: PMC2532726.

**Xi Wang** · 01-16-2010, 04:53 AM

Originally posted by Chipper View Post

"
Very GC-rich sequences have lower coverage actually, if you can't amplify it well you can't sequence it (unless you do single molecule sequencing of course)

You meant that the GC-rich sequences can not be amplified well?

**lmf_bill** · 01-18-2010, 11:47 PM

if you check the sequencing results of G-C/A-T ratio, you will find they are similar. Also, if you check this ratio along the reads, you will find the more variable in the ends, which maybe be due to adapter.

**mattanswers** · 01-19-2010, 03:24 PM

Originally posted by lmf_bill View Post

if you check the sequencing results of G-C/A-T ratio, you will find they are similar. Also, if you check this ratio along the reads, you will find the more variable in the ends, which maybe be due to adapter.

The G-C/A-T ratio is similar for our libraries (Chip-Seq experiment). However, the GC ratio for the whole genome is 36%; exons with 43 % GC and intergenic with 32 % GC, so I was expecting a much different ratio for our libraries.

**lmf_bill** · 01-19-2010, 05:53 PM

Originally posted by mattanswers View Post

The G-C/A-T ratio is similar for our libraries (Chip-Seq experiment). However, the GC ratio for the whole genome is 36%; exons with 43 % GC and intergenic with 32 % GC, so I was expecting a much different ratio for our libraries.

I know your mean. You find the ~50 % GC in Chip-Seq data. You think there is GC bias, maybe due to base-calling duplication. It seems reasonable.

another thing, how do you estimate the GC ratio, of genome, exon and intergenic? all based on ensembl annotation? Ever, I estimate the exonome of hg19, I find ~50% GC, only slightly smaller than AT content. Maybe, I need more check

**mattanswers** · 01-20-2010, 09:13 AM

I am working with Arabidopsis genome which has been sequenced. The numbers I wrote come from Table 3 of Town et al. The Plant Cell, 18:1351, 2006. They seemed to have used the TIGR annotation pipeline.

Topics	Statistics	Last Post
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 17 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 20 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 27 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM

Seqanswers Leaderboard Ad

Announcement

How to estimate error rate for short-reads and base-calling duplicate?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News