Seqanswers Leaderboard Ad

uqfgaiti · 02-22-2014, 09:36 PM

Hi all,

I am using DESeq for DGE analysis.

I have STRANDED RNA-Seq data for 4 developmental stages with no replicates.
To have a more reliable DGE I should have replicates and so I obtained (from another lab member) UNSTRANDED RNA-Seq data with 3 replicates per stage.

Before doing a DGE, I thought to test the correlation between these samples, just to show that similar samples “cluster” together. If so, I can then use the unstranded data for my DGE analysis to have more replicates per each stage.

I mapped the raw reads to the genome using TOPHAT, sorted the bam files by name and used htseq-count to get the raw reads counts for both the data. For the stranded data I used the option -s yes and for the unstranded data I used -s no.

I used DESeq to include metadata and for normalization, and I removed the genes that always have a 0 value. I then calcualted the correlation which was really low.

I then tried to use htseq-count with the option -s reverse for the stranded data and still got really low correlation.

So I reran htseq-count on the stranded data selecting the option -s no and in this way I got a very similar number of total counts between the unstranded and stranded data (while both cases before the stranded ones were double in number). I then included metadata, estimated the new size factors, normalized and calculated the new correlation. Both Pearson and Spearman performed pretty well, confirmed by both a PCA and correlogram.

Though, I'd still like to figure out a way to use the stranded counts. I am not sure if I lose some information running htseq-count using -s no on the stranded data.

What I had in mind was using unstranded data to estimate the level of variation to get a threshold for DE detection but still use the stranded data as expression values. Not sure if I can do that though given one is stranded and the other is not.

I would like to hear from you if you have any thoughts about this.

Let me know if you need more information to better understand the issue.

Thanks a lot
Federico

Topics	Statistics	Last Post
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 57 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM
TIGR Systems Offer a Compact Alternative to CRISPR for Gene Editing by seqadmin Started by seqadmin, 03-03-2025, 01:15 PM	0 responses 200 views 0 reactions	Last Post by seqadmin 03-03-2025, 01:15 PM

Seqanswers Leaderboard Ad

Low number of replicates DESeq

Latest Articles

ad_right_rmr

News