Seqanswers Leaderboard Ad

**Simon Anders** · 03-04-2013, 11:27 AM

Originally posted by Marianna85 View Post

With bot the normalization methods I obtain size factors very different:
-using DESeq 0,095 for one library and 10,85 for the other.
-using edgeR 0,14 and 7,2 respectively.

This is because edgeR gives is norm factors relative to the total read count. To get expression values on a scale comparable across sample, you have to divide the counts
- for DESeq, just by the size factod
- for edgeR, by the total read counts and by the normalization factors

**Marianna85** · 03-04-2013, 12:57 PM

Hi Simon,
happy to read your answer

Originally posted by Simon Anders View Post

This is because edgeR gives is norm factors relative to the total read count. To get expression values on a scale comparable across sample, you have to divide the counts
- for DESeq, just by the size factod
- for edgeR, by the total read counts and by the normalization factors

Just an example to better understand.
For DESeq
gene A raw counts; 5 reads in library 1 - 70 reads in library 2
library 1:36 million reads size factor: 0.09
library 2:64 million reads size factor: 10
gene A normalized counts lib 1=5/0.09 - lib2=70/10

For edgeR
gene A raw counts; 5 reads in library 1 - 70 reads in library 2
library 1:36 million reads size factor: 0.14
library 2:64 million reads size factor: 7.2
gene A normalized counts lib 1=(5/36 million)/0.14 - lib2=(70/64million)/7.2

in this case with a very huge difference in library size, it seems better to normalize with edgeR. Isn't it?

Thanks a lot.
I really appreciate your answer.

Marianna

**Shanrong** · 03-04-2013, 05:36 PM

I am using both edgeR and DESeq to analyze my own dataset. In general, the fold change reported by both are close (should be, right?).

If your orignal librares have 36 vs 64 million of total reads, your normalization factors from both methods are very weird (at least quite unusual). Please check whether you analyze your dataset properly.

The idea of normalization behind edgeR and DESeq is very similar to each other (but implementation is different). In practice, I don't see one method is superior than the other. However, it is definitely a mistake if we feed DESeq with the normalization factor from edgeR, and vice versa.

**Jeremy** · 03-04-2013, 05:46 PM

Something is going wrong somewhere, 36M and 64M reads should give normalization factors with less than a 2-fold difference. The normalization factors you listed had a 50 fold difference and suggest a much greater difference in read count.

**Marianna85** · 03-12-2013, 08:48 AM

Originally posted by Jeremy View Post

Something is going wrong somewhere, 36M and 64M reads should give normalization factors with less than a 2-fold difference. The normalization factors you listed had a 50 fold difference and suggest a much greater difference in read count.

Hi Jeremy,
in fact I was surprised to obtain such a difference...
I've not yet understood which is the mistake in the size factor calculation.

**Simon Anders** · 03-12-2013, 09:23 AM

Have you already looked at a scatter plot comparing the counts for the two samples? This should clarify what is going on.

**Marianna85** · 03-13-2013, 12:26 AM

Originally posted by Simon Anders View Post

Have you already looked at a scatter plot comparing the counts for the two samples? This should clarify what is going on.

Simon, do you mean the estimateDispersions?

**Marianna85** · 03-13-2013, 01:24 AM

This is the script I used

CountTable=read.table("decEggs.txt", header=TRUE, row.names=1 )
head(CountTable)
decDesign = data.frame(row.names = colnames( CountTable ), condition = c( "stripped", "spawned" ), libType = c( "paired-end", "paired-end" ) )
decDesign
pairedSamples = decDesign$libType == "paired-end"
condition = decDesign$condition[ pairedSamples ]
library( "DESeq" )
cds = newCountDataSet( CountTable, condition )
cds = estimateSizeFactors( cds )
sizeFactors( cds )
head( counts( cds, normalized=TRUE ) )

and the size factors have a 100 fold difference...
what should I do???

**Marianna85** · 03-14-2013, 08:33 AM

**Simon Anders** · 03-14-2013, 09:21 AM

Originally posted by Marianna85 View Post

Simon, do you mean the estimateDispersions?

No, I mean a scatter plot of the reads.

Try, e.g.,

Code:

plot( log10( 1 + counts(cds)[1,] ), log10( 1 + counts(cds)[2,] ), pch="." )

to plot the raw, unnormalized read counts of the second sample versus the first on a log scale.

**Marianna85** · 03-14-2013, 09:38 AM

Hi Simon,
the plot seems empty...

may I change the axis scale?

**Simon Anders** · 03-14-2013, 09:42 AM

This will be hard to debug via the forum. You may need to get some local help.

To try one thing: If you simply type "counts(cds)", you get your table of raw counts (or, if you just want the first 100 lines, try "head( counts(cds), 100 )". Check whether they make sense.

**Simon Anders** · 03-14-2013, 09:42 AM

Sorry, I made a type. It's

Code:

plot( log10( 1 + counts(cds)[,1] ), log10( 1 + counts(cds)[,2] ), pch="." )

**Marianna85** · 03-14-2013, 09:51 AM

of course! I defined the cds rows, not the columns.
So this is the plot...

Dropbox - 404

http://dl.dropbox.com/u/33322766/plot.emf

something strange in your opinion??

**Simon Anders** · 03-14-2013, 10:00 AM

".emf"? That's Windows extended metafile, right? Haven't seen this graphics file format in ten years, and frankly, I have no idea how to open it. Could you use something more common, please, maybe png?

Topics	Statistics	Last Post
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, Yesterday, 06:57 AM	0 responses 11 views 0 likes	Last Post by seqadmin Yesterday, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News