Originally posted by sarvidsson
View Post
Seqanswers Leaderboard Ad
Collapse
X
-
So I've checked your IPython notebooks, and the examples make sense to me. The Kalinka dataset is microarray data on log2-scale - correct? So I should be able to use RNA-Seq count data processed with DESeq2's rlog or vst, right? (http://www.bioconductor.org/packages....pdf#section.2) I guess I should skip your normalization step then, however...
You might find that more novices (at least here on SEQanswers) would want a walkthrough for data from an RNA-Seq experiment, e.g. from an unprocessed count table from HTSeq-count.
I could try this the hard way, but would it be feasible to cluster >=10 000 genes in reasonable time (given that they are well filtered)? If not, are there steps in the algorithm that could be parallelized to achieve that?
Leave a comment:
-
-
Originally posted by JamesHensman View PostI'd say your data needs to be free of outliers and other nasty behaviour.
Originally posted by JamesHensman View PostI've taken to filtering signals for signal-to-noise ratio, by dividing the variance of the replicate means by the mean of the replicate variances. You can still cluster 1000s of genes with gpclust, by too many genes which are just noise will confuse it.
Originally posted by JamesHensman View PostI would say that a method that deals with the problems you describe probably depends mostly on good data munging, rather than the method itself.
I must admit I didn't read your IPython notebooks yet, so I better do that now...
Leave a comment:
-
-
I'd say your data needs to be free of outliers and other nasty behaviour.
I've taken to filtering signals for signal-to-noise ratio, by dividing the variance of the replicate means by the mean of the replicate variances. You can still cluster 1000s of genes with gpclust, by too many genes which are just noise will confuse it.
I would say that a method that deals with the problems you describe probably depends mostly on good data munging, rather than the method itself.
Leave a comment:
-
-
Originally posted by JamesHensman View PostGPclust works well for me, but then I'm the author
You can find the code and some demo IPython notebooks here http://staffwww.dcs.sheffield.ac.uk/...n/gpclust.html
If your data is cleanish, GPclust can provide nice results like this one
Apologies for the self promotion -- I'm happy to help if there are other questions.
No offense, but my experience is that "clean" data is the exception, and is mostly encountered as example datasets in bioinformatic publications describing analysis methodsI'd be interested in a method which is robust to the problems described above.
Leave a comment:
-
-
GPclust
GPclust works well for me, but then I'm the author
You can find the code and some demo IPython notebooks here http://staffwww.dcs.sheffield.ac.uk/...n/gpclust.html
If your data is cleanish, GPclust can provide nice results like this one
Apologies for the self promotion -- I'm happy to help if there are other questions.
Leave a comment:
-
-
Not wanting to hijack the thread, but I'd be interested in trying GPclust on one large RNASeq dataset I'm working on currently. Did you use it, and if so, is it worth spending time to try it out?
Leave a comment:
-
-
There aren't that many since it's not a terribly popular method. Also, there are probably some papers using it on microarrays (the same concepts will apply).
Leave a comment:
-
-
Thank you for the quick reply!
But I do not see so many papers with SOM and RNASEQ (maybe 3?).
Do you have specific references that you have come across?
~Thanks!
Leave a comment:
-
-
There are a few papers that use SOM with RNAseq in general (a time-course exome-seq experiment would rarely make any sense), though I don't recall that they use it in the context of a time-course experiment (there's no reason that wouldn't work though). Just search pubmed for them if all you need are some papers.
Leave a comment:
-
-
SOM (Self-organising Maps) to identify expression trends in time-course data
Hi,
I am curious to know if anyone has employed SOM (Self-organising Maps) to identify expression trends in time-course data (RNASeq/Exome seq)?
~Thanks,
Rini
Latest Articles
Collapse
-
by seqadmin
The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...-
Channel: Articles
03-24-2025, 11:48 AM -
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-20-2025, 05:03 AM
|
0 responses
41 views
0 reactions
|
Last Post
by seqadmin
03-20-2025, 05:03 AM
|
||
Started by seqadmin, 03-19-2025, 07:27 AM
|
0 responses
47 views
0 reactions
|
Last Post
by seqadmin
03-19-2025, 07:27 AM
|
||
Started by seqadmin, 03-18-2025, 12:50 PM
|
0 responses
36 views
0 reactions
|
Last Post
by seqadmin
03-18-2025, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
191 views
0 reactions
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
Leave a comment: