Hello,
We are trying to process a large dataset (~200 samples, each with ~100,000 Illumina 250bp 16S reads) using MOTHUR. We now have reached the stage of making a distance matrix to identify OTUs but this distance matrix is very large (~1.7Tb even after using the cluster.split approach). Our computational power is probably enough to run this analysis on the current dataset but we are expecting more samples, and worry that at some stage the OTU table will simply be too large.
1) Is this normal (the size)?
2) Has anyone got any experience with this sort of problem? Is there a way to divide the OTU table into sub-analyses (apart from using cluster.split) and then put them together again?
Thanks
Ashraf and Daniel
We are trying to process a large dataset (~200 samples, each with ~100,000 Illumina 250bp 16S reads) using MOTHUR. We now have reached the stage of making a distance matrix to identify OTUs but this distance matrix is very large (~1.7Tb even after using the cluster.split approach). Our computational power is probably enough to run this analysis on the current dataset but we are expecting more samples, and worry that at some stage the OTU table will simply be too large.
1) Is this normal (the size)?
2) Has anyone got any experience with this sort of problem? Is there a way to divide the OTU table into sub-analyses (apart from using cluster.split) and then put them together again?
Thanks
Ashraf and Daniel
Comment