Seqanswers Leaderboard Ad

**dpryan** · 08-15-2014, 01:28 PM

Please don't cross-post on here and on the Bioconductor email list.

**feralBiologist** · 08-16-2014, 03:52 AM

the counts reported by edgeR are not normalized

This is the kind response by James MacDonald which I got in the Bioconductor list:

[BioC] zero rna-seq values AFTER normalisation in edgeR

https://stat.ethz.ch/pipermail/bioconductor/2014-August/061055.html

In short, the scores reported in all_counts are not normalised.

**feralBiologist** · 08-16-2014, 08:17 AM

dpryan, I see your point (and appreciate the help you have generously given on so many occasions) but the reason for cross-posting is that not everyone is following all the forums. In this case within a few hours I got help from the Bioconductor list and I was able to proceed with my work. But you never know how long is this going to take. Or whether you will get a response at all. I have had questions that haven't been answered at all.

What I try to do is to always crosspost the answers, too, so that people don't respond in vain and so that other people having the same issue can benefit, too.

**Gordon Smyth** · 08-18-2014, 01:23 AM

Originally posted by feralBiologist View Post

dpryan, I see your point (and appreciate the help you have generously given on so many occasions) but the reason for cross-posting is that not everyone is following all the forums.

We do ask users please not to post the same question to multiple forums simultaneously.

In this case within a few hours I got help from the Bioconductor list and I was able to proceed with my work. But you never know how long is this going to take. Or whether you will get a response at all. I have had questions that haven't been answered at all.

All reasonable questions sent to the Bioconductor mailing list get an answer. A search suggests that you have posted three questions to the Bioconductor mailing list, and that I have answered all of them myself.

The edgeR developers don't live in the same time zone as you and we can't answer everything within a few hours.

What I try to do is to always crosspost the answers, too, so that people don't respond in vain and so that other people having the same issue can benefit, too.

But your cross post of James MacDonald's answer isn't correct. The cpm values are of course normalized, they are just not "normalized counts".

**feralBiologist** · 08-18-2014, 03:03 AM

A search suggests that you have posted three questions to the Bioconductor mailing list, and that I have answered all of them myself.

You are right - and I once again thank you for this. I will not post edgeR questions to seqanswers anymore. In the past I have used seqanswers a lot more often than I have used bioconductor (and not just for edgeR) and not all of my questions have been answered. Quick search in seqanswers shows this. Maybe some of them were not precisely formulated - I don't know. But they made me think that help might not always come.

But your cross post of James MacDonald's answer isn't correct.

This is how I understood the answer of James. He says that counts are not affected by the normalization and I explained on the bioconductor thread that I understood "normalisation" to comprise all the transformations performed on the raw counts. Thanks to your kind reply in bioconductor I was reminded that in edgeR "normalisation" refers to multiple transformations and that not all of them are reflected in the cpm() output. I was about to post this clarification but you were faster than me.

Once more - thanks again for your assistance and for helping to create edgeR and other analytic tools that I have used.

**Gordon Smyth** · 08-18-2014, 04:22 PM

Originally posted by feralBiologist View Post

Thanks to your kind reply in bioconductor I was reminded that in edgeR "normalisation" refers to multiple transformations and that not all of them are reflected in the cpm() output.

Well, the cpm values are fully normalized. The issue is rather that the cpm values produced by cpm() are just for descriptive purposes. They are not used by any of the core functions in edgeR which estimate parameters or evaluate differential expression.

**feralBiologist** · 08-19-2014, 09:14 AM

Originally posted by Gordon Smyth View Post

Well, the cpm values are fully normalized. The issue is rather that the cpm values produced by cpm() are just for descriptive purposes. They are not used by any of the core functions in edgeR which estimate parameters or evaluate differential expression.

Now I am confused again. And maybe I am not the only one as the response by James MacDonald in the bioconductor thread indicates. I believe this confusion is due to the fact that "normalization" in edgeR seems to mean different things depending on the context. I might be a bit naive but to me any transformation performed on the raw score prior to computing differential expression can be described as "normalisation". This would include library size scaling, TMM, pseudocounts. You seemed to agree with James' response and he literally said "The counts are not affected by the normalization".

Now you seem to say exactly the opposite. Can you, please, clarify?

What I can say with certainty is that no pseudo-counts seem to have been added to the raw counts otherwise I wouldn't have observed the zeros. What is not clear to me whether both library scaling and TMM normalisation have been applied.

**dpryan** · 08-19-2014, 01:28 PM

CPM isn't used to calculate differential expression, so it doesn't fit your definition of normalization (normalization is a generic term that doesn't really fit what you wrote). Nothing in Gordon's reply contradicts James' a reply on the mailing list.

**feralBiologist** · 08-19-2014, 02:49 PM

Originally posted by dpryan View Post

CPM isn't used to calculate differential expression, so it doesn't fit your definition of normalization (normalization is a generic term that doesn't really fit what you wrote). Nothing in Gordon's reply contradicts James' a reply on the mailing list.

Thanks for your response but it still does not clarify the question I asked. OK, let's drop "normalisation" as it is a confusing term. What I really wanted to know is "How do you come from raw counts to cpm()'s output? What are the transformations/manipulations performed?"

One thing mentioned by Gordon Smyth is the library size scaling. Is this all? I had a look at the help info on cpm() - it does not explicitly mention anything else.

Topics	Statistics	Last Post
Study Highlights Challenges in Cellular Reprogramming for Regenerative Medicine by seqadmin Started by seqadmin, Today, 06:25 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:25 AM
New DNA Modification Discovered as Key to Gene Activation in Early Development by seqadmin Started by seqadmin, Yesterday, 01:02 PM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 01:02 PM
Wastewater Analysis Unlocks New Method for Identifying Public Health Threats by seqadmin Started by seqadmin, 09-18-2024, 06:39 AM	0 responses 14 views 0 likes	Last Post by seqadmin 09-18-2024, 06:39 AM
Molecular Markers Shared Across Dementias by seqadmin Started by seqadmin, 09-11-2024, 02:44 PM	0 responses 14 views 0 likes	Last Post by seqadmin 09-11-2024, 02:44 PM

Seqanswers Leaderboard Ad

Announcement

zero rna-seq values AFTER normalisation in edgeR

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News