Seqanswers Leaderboard Ad

**RockChalkJayhawk** · 05-21-2010, 06:22 PM

Originally posted by lpachter View Post

While it is ok to use raw counts to compare gene expression between samples, as Cole explained, to test differential expression of _isoforms_ its necessary to account for uncertainty in the assignment of reads to transcripts. Converting isoform FPKMs to counts and then applying DEseq is a bad idea because the uncertainty is then not incorporated into the DE calculation.

But wouldn't the uncertainty in the isoform abundances be overcome by the addition of biological replicates. For instance, gene X has 2 isoforms, A and B. A is assigned 20% of the gene expression, while 80% is assigned to B. If the uncertainty of the abundance of A is +- 10%, it would be dificult to assess the isoform abundance with confidence using this sample as you suggest. However, if two additional replicates call the abundance of A 15% and 23%, then the uncertainty of measurement should decrease if the precision of these combined measurements is that high. Is this correct? I recognize that the uncertainty in isoform abundance is a concern, but at what point will the accuracy and precision of the measurement only be a statistical excercise? In other words, would we benefit from knowing an isoform changed from 20% to 30%? Wouldn't the biological replicates inherentlly provide more power?

By the way, I am not a statstician, so I could be completely off base. But I am learning a lot in these discussions and thanks to everyone for participating!

**lpachter** · 05-22-2010, 07:41 AM

You are absolutely correct that biological replicates will help with accurately estimating relative isoform level abundance, and the replicated deconvolutions inform about the variability in isoform expression. With many replicates, one can directly estimate the variability in the MLE that way. But with few replicates, it is still necessary to estimate variability by leveraging variability in other transcripts and in addition it is important to account for the uncertainty in isoform level expression.

**chrisbala** · 07-26-2010, 06:29 AM

Just curious - any updates on the ETA for biological replicates in cufflinks?

also another question - if i have multiple RNAseq runs and want to predict isoforms - is the best thing to do to combine all the data into one big file and then run cufflinks? THere does not appear to be an option for including multiple sam files?

Chris

**Agent47** · 07-26-2010, 11:29 AM

Hi Chris,
I am pretty new to RNA-seq data but my first had experience with cufflinks tells me that combining files is not a very good idea.....Cufflinks is memory intense algorithm and when looking to predict new isoforms it can run forever. However if you are using a reference .gtf file for the analysis and only concerned with those which are in the refrence file, it may work.

Arpit

**Joann** · 07-27-2010, 07:11 AM

General wet lab comment:

Isoform expression is not only a key developmental marker, but also a response mechanism to environmental conditions and changes. Both are very non-static exercises of a given genome.

**lpachter** · 07-27-2010, 09:24 AM

>Just curious - any updates on the ETA for biological replicates in cufflinks?

We have been working this out and it will be released with the next update of Cufflinks on the website. We had planned to already have it out but logistical issues due to summer travel have slowed us down a bit. We'll post here as soon as its out (I hate to set a date that we don't meet but we really are planning on wrapping this up imminently).

>also another question - if i have multiple RNAseq runs and want to predict ?>isoforms - is the best thing to do to combine all the data into one big file >and then run cufflinks? THere does not appear to be an option for including >multiple sam files?

This is a very good question. Its not entirely clear that the best thing to do is to combine data- for one thing the various replicates will be useful in identifying spurious hits. We've started thinking of this because some of our collaborators are working with large case-control studies and are asking exactly the same question. For now, the best advice I can give is to merge the data.

**amackey** · 07-28-2010, 05:44 PM

Originally posted by Cole Trapnell View Post

We will be directly supporting biological replicates within the next few weeks in both cuffdiff and cufflinks itself. We've recently worked out the math for how to handle them well in our model and improve the robustness of our statistical testing. I need a few weeks to implement the enhancements and do the testing, etc.

any updates on this? Can cuffdiff now handle biological and/or technical (library prep) replicates?

**amackey** · 07-28-2010, 05:48 PM

Originally posted by amackey View Post

any updates on this? Can cuffdiff now handle biological and/or technical (library prep) replicates?

Sorry, just say Lior's reply to same question, just yesterday (I don't always notice SEQanswers paging system...) I'll wait more patiently.

Thanks again,
-Aaron

**apratap** · 07-30-2010, 08:24 AM

Hi All..

It was really interesting to read this educating discussion.

Just wondering for folks with Single Read data, would you recommend using Tophat/Cufflinks. My impression specially from the cufflinks paper is that it is basically built for PE data.

-Abhi

**chrisbala** · 08-09-2010, 01:14 PM

apratrap,

I guess the documentation says that it should work also with single-end. But too wonder whether anyone has any benchmarks/validation for single end data? Particularly with respect to isoform prediction and the detection of differential splicing?

**sdriscoll** · 08-10-2010, 07:21 AM

Originally posted by lpachter View Post

>Just curious - any updates on the ETA for biological replicates in cufflinks?

We have been working this out and it will be released with the next update of Cufflinks on the website. We had planned to already have it out but logistical issues due to summer travel have slowed us down a bit. We'll post here as soon as its out (I hate to set a date that we don't meet but we really are planning on wrapping this up imminently).

>also another question - if i have multiple RNAseq runs and want to predict ?>isoforms - is the best thing to do to combine all the data into one big file >and then run cufflinks? THere does not appear to be an option for including >multiple sam files?

This is a very good question. Its not entirely clear that the best thing to do is to combine data- for one thing the various replicates will be useful in identifying spurious hits. We've started thinking of this because some of our collaborators are working with large case-control studies and are asking exactly the same question. For now, the best advice I can give is to merge the data.

I agree - merge the replicates. Replicates are great for controlling gene expression (finding normal biological variation) but when comparing control/mutant isoforms I find it's best to merge the replicate read sets and then run them through the Tophat pipeline. The problem is when you have only 20-30 million alignments you'll still see splicing variations in biological replicates up to pretty good RPKM levels which means to make valid comparisons you're going to have to not trust a large portion of your data. While putting more reads into the mix doesn't necessarily alter the gene expression values it does increase the robustness of those expressions which should equate to more complete/robust isoforms reported.

The more reads the better! It is only fair to then compare isoforms between control/mutant that have a similar number of alignments making up the data so you can rule out the possibility that if you DID have the same number of alignments some of the variation might go away.

**chrisbala** · 08-10-2010, 08:47 AM

combining runs

I am a bit stumped at the moment.

What would be the recommended pipeline (anyone feel free to chime in!)

What I have done (I guess the standard pipeline): Map reads from each run independently with tophat > Run cufflinks for each run > Cuffcompare > Cuffdiff

What I would like to do:

Generate a 'master' mapping/sam file by combining all of my reads and mapping in tophat > Analyze those reads in Cufflinks w/ reference gtf to produce a master gtf (w/ annotation) > THen go back to quantitate each run independently relative to the new gtf?

I can't seem to think of how to do this? I suppose could just go back and split the "master" sam file on the basis of the run identifier? Sorry ... bioinformaticist in training...

**Anna Esteve** · 08-23-2010, 01:14 AM

total mapped fragments RNA-seq

I am interested in having read counts for RNA-seq differential expression analysis. Does anybody know how to count the total number of fragments mapped if my Sam file (from Tophat) have a mixture of proper and improper pairs? I am using this formula:

read counts= fpkm x length transcript x total fragments mapped /10e9

Thanks.

**adumitri** · 12-07-2010, 07:41 AM

Cuffdiff - differential expression analysis between groups of samples

We will be directly supporting biological replicates within the next few weeks in both cuffdiff and cufflinks itself. We've recently worked out the math for how to handle them well in our model and improve the robustness of our statistical testing. I need a few weeks to implement the enhancements and do the testing, etc.

Hello,

Cole mentioned on this thread in May that he is working on introducing the differential expression analysis functionality for groups of samples in Cuffdiff (e.g. control samples compared with treated samples). Is there any news about this new version of Cuffdiff? Will it be released soon or is it already available somewhere?

Given the currently available Cuffdiff version (v0.9.3), is there any viable workaround to analyze groups of samples?

Thank you,
Alexandra

**Thomas Doktor** · 12-07-2010, 07:48 AM

Replicates/groups has been supported for some time now, but paired samples are not supported (I don't know of any RNA-seq software than handles paired samples).

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News