Seqanswers Leaderboard Ad

**Dario1984** · 02-05-2013, 05:00 PM

Using a combination of Bioconductor packages will give you what you need. You can make raw counts in gene regions, then use the function calcNormFactors in edgeR to work out the compositional bias of each sample. To get raw coverage, you can use the coverage function from GenomicRanges, then multiply each sample's coverage by the scale factor you calculated with calcNormFactors. To export the coverage to a BedGraph or BigWig file, the export function from rtracklayer can be used.

**puggie** · 02-06-2013, 03:44 PM

Thanks for your reply,

I have now done the analysis in R after calculating raw counts against ensembl genes. For three of the samples (ABC) I get from calcNormFactors:

sample lib.size. norm.scaling
A 5604846 0.9273452
B 4433633 1.0454615
C 6520510 1.0314556

Intuitively, I would have thought that C should have been scaled somewhat compared to B, when comparing library sizes.

The raw counts were calculated on exon/intron of genes and excluding intervals of <50 counts

**Dario1984** · 02-06-2013, 04:00 PM

Make an MA plot before and after normalisation. The function is maPlot in edgeR. You will see that the data points will be centred around M = 0 after normalisation. This is based on the biological assumption that, between conditions, the majority of genes don't change in expression.

**puggie** · 02-07-2013, 01:46 PM

Okay I will try this.

Regarding the raw counts table, what would be the best procedure for selecting regions for normalization? Lets say I have an ensembl annotation file of 30.000 regions total (isoforms merged etc.). When I do the raw counting I get something like <10.000 regions/sample, which may contain up to several thousand read counts. Also I see a pattern between the samples e.g. from random line selection I could get something looking like this for 4 samples:

0 0 2 0
0 15 0 0
96 143 71 132
1 0 5 0
850 1201 1171 907
1 0 0 1

Hence same genes seems to be active, which makes sense as the samples are from same tissue type.

What would be the best way to buidling this table, e.g. taking all intervals (genes) into account in edgeR which are 1. Expressed and 2. Expression in general do not deviate by a preset factor ??

Or is there some recommended "general gene list" that is considered stable like we know from the qPCR days.

**Dario1984** · 02-07-2013, 03:00 PM

It's a good idea to get rid of lowly expressed genes before calculating the normalisation factors. There is no safe gene list. I use all of them and don't filter on variance.

**Richard Finney** · 02-07-2013, 03:27 PM

It's a good idea to get rid of lowly expressed genes

Why is that?

**Dario1984** · 02-07-2013, 08:00 PM

The estimates for fold change aren't stable for those genes. A couple of extra reads here or there could change the fold change calculation drastically for a lowly expressed gene. Also, a rough rule is that about ten percent of genes are being reproducibly expressed in a cell at any one time, so unstable fold changes from spurious, low transcription would contribute the most to the calculation.

**sisterdot** · 04-09-2013, 03:23 AM

two options that have not been tested:

1) genomeCoverageBed has a -scale option (e.g. DESeq estimateSizeFactors), although i guess Dario1984 suggestion might be easier: "get raw coverage, you can use the coverage function from GenomicRanges, then multiply each sample's coverage by the scale factor you calculated with calcNormFactors. To export the coverage to a BedGraph or BigWig file, the export function from rtracklayer can be used."

2) using normalize_bigwig.py (RSeQC package)

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 24 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

The way to normalize RNA-SEQ coverage data for multiple samples?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News