Homer / getDiffExpression.pl / edgeR

blumroy

Junior Member

Join Date: Aug 2013

Posts: 1
- Share
- Tweet
#1

Homer / getDiffExpression.pl / edgeR

02-10-2016, 10:47 AM

To complement our cuffdiff analysis we thought to take a parallel route using Homer:

To do so we basically followed this protocol:

makeTagDirectory pairend.sam -format sam -fragLength 250 -removeSpikes 10000 5 -genome /data//projects/mm9/ref.fasta -checkGC -tbp 1 -sspe -flip

Then, we run analyzeRepeats.pl to make read summaries across genes:
analyzeRepeats.pl rna mm9 -noadj -count genes -strand + -d MIa9_1 MIa9_2 > Scramble_vs_Target_analyzeRepeats.txt

We run this step as recommended by the Homer tutorial with the unnormalized data so the edgeR analysis, which is the following step, would be able to use an unnormalized data.

Next step we run:
getDiffExpression.pl Scramble_vs_Target_analyzeRepeats.txt MIa9_1 MIa9_2 -repeats -edgeR > Scramble_vs_Target_analyzeRepeats_edgeR.txt

After testing the execution of this command and checking closely the perl script for getDiffExpression we run into the following understanding:
The getDiffExpression step takes unnormalized data and transmits it to edgeR so edgeR can calculate log2FDR, p-value, and FDR. off the original unnormalized data. edgeR creates a model for the data and uses pseudo values in cases of zero values. edgeR finally outputs the log2FDR, p-value, and FDR as calculated after transferring the original data into a modeled space.

Now, the output of getDiffExpression.pl is composed of two parts:
The three columns at the very right end, which contain the edgeR's output (Log2FC; p-value, FDR).
AND the output also contains - the rest of the table (all the left side-end columns) which forms a matrix. The matrix is of normalized values. BUT these are not normalized values created by edgeR, but instead - normalized values that were created as part of getDiffExpression.pl script execution.
This data is only presented in the final output table next to the edgeR output (three right columns) and is aimed only for the user to see, but this normalized values were NOT actually been used as part of the edgeR analysis (unlike what I originally thought...).

My issue is that we are currently not sure in what manner the data which presented in this table was actually normalized.

Please see this link for relevant indications:

http://homer.salk.edu/homer/ngs/analyzeRNA.html

Look for explanatory indication such as the following which exactly states what I've described above:
"This command will basically add 3 columns to the end of your original file per pairwise comparison. Most important are the log2 fold change and the p-value and FDR values. The raw read counts will be normalized in this file based on the total reads in each column so that the values are more easily to compare when looking through the results."

However, our attempts to re-run the previous step again - (run analyzeRepeats.pl) - using this time the normalization options in order to obtain the normalized values that appear in the output from getDiffExpression.pl did not prove to be successful...
We tried:
analyzeRepeats.pl rna mm9 -noadj -count genes -strand + -d MIa9_1 MIa9_2 -normMatrix 1e6

And also,
analyzeRepeats.pl rna mm9 -noadj -count genes -strand + -d MIa9_1 MIa9_2 -norm 1e7

but the normalized values that we obtained were quite different from the ones that were obtained after running the getDiffExpression.pl script...

We think that it would become important to know how the data that appears in the output table from getDiffExpression.pl was actually normalized. Was it per all reads? was it per all reads in each individual library? ...

If any of the users have some more information and ideas regarding this matter we'll be glad to hear!
Alternatively, we could ignore the normalized values content of the output table, and refer only to the results of edgeR that show the statistics.

It would be interesting to hear how other users decided to normalize their rna-seq data using this approach. Which selections were finally chosen.

Thanks a lot!

Roy
Tags: analyzerepeats.pl, edger, getdiffexpression.pl, homer, rna-seq

Previous template Next

Latest Developments in Precision Medicine

by seqadmin

Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
- Channel: Articles
05-24-2024, 01:16 PM
Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM

Topics	Statistics	Last Post
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, Yesterday, 06:55 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 06:55 AM
Genetic Mosaicism More Prevalent Than Previously Thought by seqadmin Started by seqadmin, 05-30-2024, 03:16 PM	0 responses 24 views 0 likes	Last Post by seqadmin 05-30-2024, 03:16 PM
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability by seqadmin Started by seqadmin, 05-29-2024, 01:32 PM	0 responses 29 views 0 likes	Last Post by seqadmin 05-29-2024, 01:32 PM
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, 05-24-2024, 07:15 AM	0 responses 215 views 0 likes	Last Post by seqadmin 05-24-2024, 07:15 AM

Seqanswers Leaderboard Ad

Announcement

Homer / getDiffExpression.pl / edgeR

Latest Articles

ad_right_rmr

News