Unconfigured Ad

**epigen** · 08-23-2010, 07:24 AM

I have a similar task and would be interested in a professional answer. Naively I'll try with HTSeq and DESeq on simple read count data and compare my samples pairwise.

**severin** · 08-23-2010, 08:08 AM

Boxplot-dendrogram

We ran into similar problems when looking at this kind of data. The resulting dendrograms for the large sets of gene lists that come out of the next generation sequencing data can be difficult to visualize. We used both a heatmap approach and a combination of a dendrogram with boxplots over a time series in the paper we just published (RNA-Seq atlas of Glycine max -- http://seqanswers.com/forums/showthread.php?t=6321).

**quix** · 08-24-2010, 06:49 PM

I have exactly the same question. Can anybody give some idea?

**schmima** · 08-24-2010, 09:58 PM

some none professional answers

The resulting dendrograms for the large sets of gene lists that come out of the next generation sequencing data can be difficult to visualize.

Definitely - but this applies to all large data sets. Drawing a heatmap and dendrogram with 20'000 genes in 20 samples will never look very nice - and I personally think it is also not giving you a lot of information what is going on biologically. So either one takes a subset (as severin in the paper) or one groups the genes in a senseful way prior to plotting (eg GO terms / gene families / PFAM domains etc). Depending on the experiment there may be also some groups that are anyway not in the focus and can be left out.

So - in my opinion - I would first think on what I would like to show... So if I have a timecourse where I'm interested in what makes the difference I would first search for genes / gene sets (grouped together in a senseful way - eg function) that show the major difference between the samples and only plot these. This should reduce the amount of data plotted, in case of groups it links naked gene names to a term that one understands (e.g. 'ABC transporters' tells me personally more than 'ATXGXXXXX' or a '.' in a picture).

However - this requires some timecourse analysis... What is not the most unproblematic thing (eg due to between timepoints correlation). And it is also the question what is tested/what would you like to know... I guess there may be some helpful literature related to timecourses and ANOVA (not that you need to use ANOVA - but I think it is a good option to get some general principles and problems of timecourse studies).

**severin** · 08-25-2010, 06:28 AM

groupings

Originally posted by schmima View Post

group the genes in a senseful way prior to plotting (eg GO terms / gene families / PFAM domains etc).

I am in agreement with schmima here. One of the easiest ways to group genes is to look into the following groups: highest expressed (rowsum across the time points), time point specific expression, expressed in one time point significantly higher than all other time points (this is what we did for seed over all other tissues in the paper I mentioned before).

Genes that show no expression in any time point can be removed from the analysis and reduce your gene list sometimes substantially.

I have also seen analysis that group expression into groups in a K-means manner to try to identify the major themes in the expression.

Like with most data I strongly recommend just playing with the data and seeing what jumps out at you then follow up on it. Look closely at the subgroups I mentioned above and also transcription factors and tissue related gene families in the time series.

You can also look at change in expression rather than expression values. how does the expression change between point 1 and 2 or point 2 and 3 or 1 and 3 etc.

**Sol** · 10-27-2010, 12:00 PM

I need to take in the graph generated in MA-plot DEGseq, the differentially expressed genes. has some software that does this? or script?
thanks

**severin** · 10-27-2010, 12:57 PM

graphs and figures

Any command in R that produces a figure can typically be wrapped to produce a pdf or tiff or jpeg output rather than output to an R graph. Look into the R help on each output type for more information.

Here is a really simple pdf wrapper function

makepdf<-function(x,filename,w,h){
pdf(file=filename,width=w,height=h)
x
dev.off()
}

An example of how to use it.

makepdf(plot(1:10),"plot.pdf",5,5)

**Sol** · 11-15-2010, 06:31 AM

Good morning.I need to normalize the data leaving the software analysis of SOLiD, Bioscope?
I need to normalize?
thanks

**sdvie** · 11-30-2010, 08:14 AM

time courses and heat maps

From my previous experience with time course experiments (however, this was in the proteomics field), I recommend the following:
- Decide first which is your time point of reference. This has to be clear already when you design the experimental protocol.
- Use the data of this timepoint as "background"/ zero / reference (whatever you would like to call it) and then calculate the ratio of all the other time points with respect to this one.
- Once you have fold-chance or log ratio values by gene per time point, you can visualize the values in a heatmap (I did this once with RPKMs using Gitools @ http://www.gitools.org)

**dphansti** · 07-14-2011, 10:41 PM

I would recommend clustering the time-course expression profiles of each gene using fuzzy c means clustering. I am pretty sure this can be done in R fairly easily. Then you can look for enrichment of specific pathways or GO terms in each cluster. And maybe you can see what genes are regulated early, middle, and late. Perhaps middle or late genes are regulated by a transcription factor that you see increased in the early group. Just an idea.

But i would definitely look into the fuzzy c means clustering. Look at figure 7 in this paper for the type of output you can expect from it.

Rigbolt KT, Prokhorova TA, Akimov V, Henningsen J, Johansen PT, Kratchmarova
I, Kassem M, Mann M, Olsen JV, Blagoev B. System-wide temporal characterization
of the proteome and phosphoproteome of human embryonic stem cell differentiation.
Sci Signal. 2011 Mar 15;4(164):rs3. PubMed PMID: 21406692.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 40 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 46 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

RNA-seq, RPKM and heatmap???

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News