Hi all,
I am using DESeq for DGE analysis.
I have STRANDED RNA-Seq data for 4 developmental stages with no replicates.
To have a more reliable DGE I should have replicates and so I obtained (from another lab member) UNSTRANDED RNA-Seq data with 3 replicates per stage.
Before doing a DGE, I thought to test the correlation between these samples, just to show that similar samples “cluster” together. If so, I can then use the unstranded data for my DGE analysis to have more replicates per each stage.
I mapped the raw reads to the genome using TOPHAT, sorted the bam files by name and used htseq-count to get the raw reads counts for both the data. For the stranded data I used the option -s yes and for the unstranded data I used -s no.
I used DESeq to include metadata and for normalization, and I removed the genes that always have a 0 value. I then calcualted the correlation which was really low.
I then tried to use htseq-count with the option -s reverse for the stranded data and still got really low correlation.
So I reran htseq-count on the stranded data selecting the option -s no and in this way I got a very similar number of total counts between the unstranded and stranded data (while both cases before the stranded ones were double in number). I then included metadata, estimated the new size factors, normalized and calculated the new correlation. Both Pearson and Spearman performed pretty well, confirmed by both a PCA and correlogram.
Though, I'd still like to figure out a way to use the stranded counts. I am not sure if I lose some information running htseq-count using -s no on the stranded data.
What I had in mind was using unstranded data to estimate the level of variation to get a threshold for DE detection but still use the stranded data as expression values. Not sure if I can do that though given one is stranded and the other is not.
I would like to hear from you if you have any thoughts about this.
Let me know if you need more information to better understand the issue.
Thanks a lot
Federico
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 11-08-2024, 11:09 AM
|
0 responses
34 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 11:09 AM
|
||
Started by seqadmin, 11-08-2024, 06:13 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
32 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
23 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|