what is cluster analysis

11 months ago 36
Nature

Cluster analysis is a data analysis technique that groups a set of objects into clusters based on how similar they are to each other. It is an unsupervised machine learning-based algorithm that acts on unlabelled data. Cluster analysis is used to identify patterns and relationships within data that may not be immediately obvious. It is concerned with data matrices in which the variables have not been partitioned beforehand into criterion versus predictor subsets. The objective of cluster analysis is to find similar groups of subjects, where “similarity” between each pair of subjects means some global measure over the whole set of characteristics. Cluster analysis is not one specific algorithm, but the general task to be solved, and it can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them. There are many different algorithms used for cluster analysis, such as k-means, hierarchical clustering, and density-based clustering. The choice of algorithm will depend on the specific requirements of the analysis and the nature of the data being analyzed. Cluster analysis is used in various fields, including crime analysis, identifying patterns of family life trajectories, professional careers, and daily or weekly time use.