Seqanswers Leaderboard Ad

**sdriscoll** · 11-12-2012, 09:30 AM

The bundle id column can be used to group rows of results.xprs into genes. That id indicates the isoforms that shared reads. Unfortunately that can't be used to coordinate multiple files. RSEM provides this information for you...you might give that a try. If you don't like it at least you can have it produce the list of isoforms that should be collapses into genes.

**dietmar13** · 11-12-2012, 10:13 AM

sum or average

thank you. but why can't I use the ENSG-gene annotation for the ENST-transcript collapse. What I didn't know for sure is, if I should sum or average the eff.counts (taking into account shared reads: are eff. counts corrected for these shared reads - then I should sum the values)...

**sdriscoll** · 11-12-2012, 02:14 PM

ah, i see. yes the idea is that you can add those counts to get gene level values. since the alignments have been disambiguated in their algorithm adding them would be the correct thing to do. you can also add the FPKM values.

**hpimentel** · 11-12-2012, 02:44 PM

Originally posted by dietmar13 View Post

thank you. but why can't I use the ENSG-gene annotation for the ENST-transcript collapse. What I didn't know for sure is, if I should sum or average the eff.counts (taking into account shared reads: are eff. counts corrected for these shared reads - then I should sum the values)...

Hi dietmar. I am Harold from Lior Pachter's group.

Sure, you can collapse them as you've suggested (summing eff_counts).

The only issue here is that these estimates all have their own amount of variance (and also covariance). Depending on your final goal, ignoring the variability might not be sufficient.

I am currently working on a tool for differential expression that will be taking into account this variability. What is your end-goal for this analysis?

Thanks,

Harold

**dietmar13** · 11-12-2012, 10:22 PM

endgoal

hi harold,

we have two aims:
-differential gene expression for biological interpretation (and therefore splice-isoforms and transcript variants are not really important, because there are no tools to use this information)

-use of the gene expression levels for biomarker development, i.e. to build discriminative models with machine learning methods, and finally use RT-qPCR to measure the genes in these models for diagnostik and prognostik purposes. Probes for these model-genes will be selected that they recognize as much as possible isoforms (therefore the gene-level values).

if you need some beta-tester for your tool: i have a 12 vs 12 matched pairs RNA-seq set (colon cancer vs normal tissue, sequenced to only 2 mio reads per sample, downloaded from the internet) which i have analysed extensively with many DE methods (DEseq, baySeq, noiseq, edgeR, limma, poissonseq, SAMseq, cuffdiff, quasiseq, EDSeq) - and the statistical power is already similar to microarray with only 2 mio reads.

dietmar

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 48 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

eXpress merge transcripts to genes

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News