Skip to search form Skip to main content You are currently offline. Some features of the site may not work correctly. Clustering has been extensively studied for over 40 years and across many disciplines due to its broad applications. Most books on pattern classification and machine learning contains chapters on cluster analysis or unsupervised learning.

## Data Mining - Cluster Analysis

Cluster is a group of objects that belongs to the same class. In other words, similar objects are grouped in one cluster and dissimilar objects are grouped in another cluster. While doing cluster analysis, we first partition the set of data into groups based on data similarity and then assign the labels to the groups. The main advantage of clustering over classification is that, it is adaptable to changes and helps single out useful features that distinguish different groups. Clustering analysis is broadly used in many applications such as market research, pattern recognition, data analysis, and image processing.

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group called a cluster are more similar in some sense to each other than to those in other groups clusters. It is a main task of exploratory data mining , and a common technique for statistical data analysis , used in many fields, including pattern recognition , image analysis , information retrieval , bioinformatics , data compression , computer graphics and machine learning. Cluster analysis itself is not one specific algorithm , but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances between cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings including parameters such as the distance function to use, a density threshold or the number of expected clusters depend on the individual data set and intended use of the results.

The fundamental algorithms in data mining and analysis form the basis for the emerging field of data science, which includes automated methods to analyze patterns and models for all kinds of data, with applications ranging from scientific discovery to business intelligence and analytics. This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics. The main parts of the book include exploratory data analysis, pattern mining, clustering, and classification. The book lays the basic foundations of these tasks, and also covers cutting-edge topics such as kernel methods, high-dimensional data analysis, and complex graphs and networks. With its comprehensive coverage, algorithmic perspective, and wealth of examples, this book offers solid guidance in data mining for students, researchers, and practitioners alike.

Embed Size px x x x x Cluster analysis divides data into groups clusters that are meaningful, useful,or both. If meaningful groups are the goal, then the clusters should capture thenatural structure of the data. In some cases, however, cluster analysis is only auseful starting point for other purposes, such as data summarization. Whetherfor understanding or utility, cluster analysis has long played an importantrole in a wide variety of fields: psychology and other social sciences, biology,statistics, pattern recognition, information retrieval, machine learning, anddata mining. There have been many applications of cluster analysis to practical prob-lems. We provide some specific examples, organized by whether the purposeof the clustering is understanding or utility.

Clustering aims to reduce complexity by finding meaningful structures within the data. According to their similarity in a predefined set of variables, participants​.

## Cluster analysis

Modern science and engineering are based on using first — principle models to describe physical, biological, and social systems.

