# Some methods for classification and analysis of multivariate observations

@inproceedings{MacQueen1967SomeMF, title={Some methods for classification and analysis of multivariate observations}, author={J. MacQueen}, year={1967} }

The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends… Expand

#### 23,432 Citations

Supervised Nested Algorithm for Classification Based on K-Means

- Computer Science
- 2020

This paper presents an extension of the k-means algorithm based on the idea of recursive partitioning that can be used as a classification algorithm in the case of supervised classification and carries the integration of parametric model into trees one step further. Expand

Implementation of the k-means Method for Single and Multi - Dimensions

- Mathematics
- 2010

Clustering is the process of grouping the data into classes or clusters, where the class label of each of the object is not known [1].In the case of dataset to be clustered consisting of n objects,… Expand

Experiments for the Number of Clusters in K-Means

- Computer Science
- EPIA Workshops
- 2007

An adjusted iK-Means method is proposed, which performs well in the current experiment setting and is compared to the least squares and least modules version of an intelligent version of the method by Mirkin. Expand

Method of Classification through Normal Distribution Approximation Using Estimating the Adjacent and Multidimensional Scaling

- Computer Science
- 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)
- 2016

This study proposes machine learning algorithms that approximates the density of the influence of the training data using a density function of normal distribution and proposes improved method that relocates theTraining data from the distance between the trainingData by multidimensional scaling as preprocessing. Expand

Variable Selection in K-Means Clustering via Regularization

- Computer Science
- 2014

A new method of K-means clustering is proposed to detect irrelevant variables to the cluster structure and achieves the purpose of calculating variable weights using an entropy regularization method. Expand

Improved Clustering with Augmented k-means

- Mathematics
- 2017

Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering,… Expand

A Comparison of K-Means and Mean Shift Algorithms

- 2021

Clustering, or otherwise known as cluster analysis, is a learning problem that takes place without any human supervision. This technique has often been utilized, much efficiently, in data analysis,… Expand

Semi-supervised clustering methods

- Computer Science, Mathematics
- Wiley interdisciplinary reviews. Computational statistics
- 2013

This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in many situations, including document processing and modern genetics. Expand

Model Based Penalized Clustering for Multivariate Data

- Computer Science
- 2007

This paper has developed a decision theoretic framework by which traditional K-means can be given a probabilistic footstep, which will not only enable us to do a soft clustering, rather the whole optimization problem could be recasted into Bayesian modeling framework, in which the knowledge of cluster number could be treated as an unknown parameter of interest, thus removing a severe constrain of K- means algorithm. Expand

A Comparison of Latent Class, K-Means, and K-Median Methods for Clustering Dichotomous Data

- Computer Science, Medicine
- Psychological methods
- 2017

Simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data found that the 3 approaches can exhibit profound differences when applied to real data. Expand

#### References

SHOWING 1-10 OF 17 REFERENCES

On Grouping for Maximum Homogeneity

- Mathematics
- 1958

Abstract Given a set of arbitrary numbers, what is a practical procedure for grouping them so that the variance within groups is minimized? An answer to this question, including a description of an… Expand

Comparison of Experiments

- Mathematics
- 1951

1. Summary Bohnenblust, Shapley, and Sherman [2] have introduced a method of comparing two sampling procedures or experiments; essentially their concept is that one experiment a is more informative… Expand

Hierarchical Grouping to Optimize an Objective Function

- Mathematics
- 1963

Abstract A procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for… Expand

Note on Grouping

- Mathematics
- 1957

Abstract Suppose that it is required to condense observations of a variate into a small number of groups, the grouping intervals to be chosen to retain as much information as possible. One way of… Expand

Data analysis in the social sciences: what about the details?

- Computer Science
- AFIPS '65 (Fall, part I)
- 1965

This paper attempts to demonstrate that there exists a class of techniques more suitably oriented toward the capabilities of the digital computer than are conventional analytic statistical techniques, and maintains that these techniques are capable of considering details in social sciences data, that is, relating the individuals described in the data. Expand

A TCHEBYCHEFF-LIKE INEQUALITY FOR STOCHASTIC PROCESSES.

- Mathematics, Medicine
- Proceedings of the National Academy of Sciences of the United States of America
- 1965

Cluster analysis of multivariate data : efficiency versus interpretability of classifications

- Mathematics
- 1965

On convergence of k-means and partitions with minimum average variance

- Ann. Math. Statist
- 1965

Decision Making Process in Pattern Recognition

- Decision Making Process in Pattern Recognition
- 1962