Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mizar106
    Junior Member
    • Mar 2014
    • 6

    Clustering of ChIPseq data

    Hi guys!
    I have an issue about the analysis of different type of ChIP-seq data. I want to combine them using clustering to observe meaningful epigenetic patterns in the dataset. Briefly, I generated a matrix with rows representing genomic 200bp bins (I have different millions of rows) and epigenetic marks in columns. I apply pam-clustering (clara from 'cluster' R package) to the matrix and fortunately seems to work and it is quite fast. The problem is about the method to determine the optimal number of clusters. I tried different approaches from different R packages (silhouette, pamk, gap statistic and so on..) but obviously all of them didn't work since they require too much memory in R. So, my idea was to extract a subset of , let's say, 10000/50000 rows from the full matrix and use them to infer the optimal cluster number. Do you think it could be correct? In that case, of course, I would have to find a good criteria to define my subset. Otherwise, I didn't find any other solution to set the optimal k for the moment. I would be very grateful if somebody can help me. Thanks a lot.
    fran

Latest Articles

Collapse

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by SEQadmin2, 06-05-2026, 10:09 AM
0 responses
13 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-04-2026, 08:59 AM
0 responses
24 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-02-2026, 12:03 PM
0 responses
28 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-02-2026, 11:40 AM
0 responses
22 views
0 reactions
Last Post SEQadmin2  
Working...