Unconfigured Ad

**Simon Anders** · 03-04-2013, 11:27 AM

Originally posted by Marianna85 View Post

With bot the normalization methods I obtain size factors very different:
-using DESeq 0,095 for one library and 10,85 for the other.
-using edgeR 0,14 and 7,2 respectively.

This is because edgeR gives is norm factors relative to the total read count. To get expression values on a scale comparable across sample, you have to divide the counts
- for DESeq, just by the size factod
- for edgeR, by the total read counts and by the normalization factors

**Marianna85** · 03-04-2013, 12:57 PM

Hi Simon,
happy to read your answer

Originally posted by Simon Anders View Post

This is because edgeR gives is norm factors relative to the total read count. To get expression values on a scale comparable across sample, you have to divide the counts
- for DESeq, just by the size factod
- for edgeR, by the total read counts and by the normalization factors

Just an example to better understand.
For DESeq
gene A raw counts; 5 reads in library 1 - 70 reads in library 2
library 1:36 million reads size factor: 0.09
library 2:64 million reads size factor: 10
gene A normalized counts lib 1=5/0.09 - lib2=70/10

For edgeR
gene A raw counts; 5 reads in library 1 - 70 reads in library 2
library 1:36 million reads size factor: 0.14
library 2:64 million reads size factor: 7.2
gene A normalized counts lib 1=(5/36 million)/0.14 - lib2=(70/64million)/7.2

in this case with a very huge difference in library size, it seems better to normalize with edgeR. Isn't it?

Thanks a lot.
I really appreciate your answer.

Marianna

**Shanrong** · 03-04-2013, 05:36 PM

I am using both edgeR and DESeq to analyze my own dataset. In general, the fold change reported by both are close (should be, right?).

If your orignal librares have 36 vs 64 million of total reads, your normalization factors from both methods are very weird (at least quite unusual). Please check whether you analyze your dataset properly.

The idea of normalization behind edgeR and DESeq is very similar to each other (but implementation is different). In practice, I don't see one method is superior than the other. However, it is definitely a mistake if we feed DESeq with the normalization factor from edgeR, and vice versa.

**Jeremy** · 03-04-2013, 05:46 PM

Something is going wrong somewhere, 36M and 64M reads should give normalization factors with less than a 2-fold difference. The normalization factors you listed had a 50 fold difference and suggest a much greater difference in read count.

**Marianna85** · 03-12-2013, 08:48 AM

Originally posted by Jeremy View Post

Something is going wrong somewhere, 36M and 64M reads should give normalization factors with less than a 2-fold difference. The normalization factors you listed had a 50 fold difference and suggest a much greater difference in read count.

Hi Jeremy,
in fact I was surprised to obtain such a difference...
I've not yet understood which is the mistake in the size factor calculation.

**Simon Anders** · 03-12-2013, 09:23 AM

Have you already looked at a scatter plot comparing the counts for the two samples? This should clarify what is going on.

**Marianna85** · 03-13-2013, 12:26 AM

Originally posted by Simon Anders View Post

Have you already looked at a scatter plot comparing the counts for the two samples? This should clarify what is going on.

Simon, do you mean the estimateDispersions?

**Marianna85** · 03-13-2013, 01:24 AM

This is the script I used

CountTable=read.table("decEggs.txt", header=TRUE, row.names=1 )
head(CountTable)
decDesign = data.frame(row.names = colnames( CountTable ), condition = c( "stripped", "spawned" ), libType = c( "paired-end", "paired-end" ) )
decDesign
pairedSamples = decDesign$libType == "paired-end"
condition = decDesign$condition[ pairedSamples ]
library( "DESeq" )
cds = newCountDataSet( CountTable, condition )
cds = estimateSizeFactors( cds )
sizeFactors( cds )
head( counts( cds, normalized=TRUE ) )

and the size factors have a 100 fold difference...
what should I do???

**Marianna85** · 03-14-2013, 08:33 AM

**Simon Anders** · 03-14-2013, 09:21 AM

Originally posted by Marianna85 View Post

Simon, do you mean the estimateDispersions?

No, I mean a scatter plot of the reads.

Try, e.g.,

Code:

plot( log10( 1 + counts(cds)[1,] ), log10( 1 + counts(cds)[2,] ), pch="." )

to plot the raw, unnormalized read counts of the second sample versus the first on a log scale.

**Marianna85** · 03-14-2013, 09:38 AM

Hi Simon,
the plot seems empty...

may I change the axis scale?

**Simon Anders** · 03-14-2013, 09:42 AM

This will be hard to debug via the forum. You may need to get some local help.

To try one thing: If you simply type "counts(cds)", you get your table of raw counts (or, if you just want the first 100 lines, try "head( counts(cds), 100 )". Check whether they make sense.

**Simon Anders** · 03-14-2013, 09:42 AM

Sorry, I made a type. It's

Code:

plot( log10( 1 + counts(cds)[,1] ), log10( 1 + counts(cds)[,2] ), pch="." )

**Marianna85** · 03-14-2013, 09:51 AM

of course! I defined the cds rows, not the columns.
So this is the plot...

Dropbox - 404

http://dl.dropbox.com/u/33322766/plot.emf

something strange in your opinion??

**Simon Anders** · 03-14-2013, 10:00 AM

".emf"? That's Windows extended metafile, right? Haven't seen this graphics file format in ten years, and frankly, I have no idea how to open it. Could you use something more common, please, maybe png?

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 26 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 36 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 60 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News