Clustering: Difference between revisions

Latest revision as of 02:08, 12 May 2022

Clustering is an unsupervised learning technique. It is used for grouping data points, or objects that are somehow similar. Clustering means finding clusters in a dataset, unsupervised.^[1] Clustering is one of the most used applications of unsupervised learning.

Motivation

Generally, clustering can be used for one of the following purposes^[1]:

Exploratory data analysis
Summary generation
Outlier detection [1]
Finding duplicates
Pre-processing step

Types pof clustering

Some divide clustering into two subgroups^[2]:

Hard clustering: Each data point either belongs to a cluster completely or not. Clusters do not overlap.
Soft clustering: A probability or likelihood is assigned for putting data points into separate clusters. Clusters may overlap.

Clustering vs classification

Algorithms

Some of the commonly used clustering algorithms are^[3]:

others

Expectation maximization:
Hierarchical cluster analysis (HCA):

Applications

External links

From Hard to Soft Clustering Pavel A. Pevzner

References

↑ ^1.0 ^1.1 ^1.2 ^1.3 ^1.4 Intro to ClusteringCoursera
↑ An Introduction to Clustering and different methods of clusteringanalyticsvidhya.com
↑ Real World Applications of Unsupervised Learningpythonistaplanet.com

[MLPython-1] 1.0 ^1.1 ^1.2 ^1.3 ^1.4 Intro to ClusteringCoursera

[2] An Introduction to Clustering and different methods of clusteringanalyticsvidhya.com

[pythonistaplanet-3] Real World Applications of Unsupervised Learningpythonistaplanet.com

[1]

[2]

[3]

@@ Line 1: / Line 1: @@
-'''Clustering''' is an [[unsupervised learning]] technique. It is used for grouping data points, or objects that are somehow similar. Clustering means finding clusters in a dataset, unsupervised.<ref name="MLPython">[https://www.coursera.org/learn/machine-learning-with-python/lecture/Nlxjw/intro-to-clustering Intro to Clustering]Coursera</ref>
+'''Clustering''' is an [[unsupervised learning]] technique. It is used for grouping data points, or objects that are somehow similar. Clustering means finding clusters in a dataset, unsupervised.<ref name="MLPython">[https://www.coursera.org/learn/machine-learning-with-python/lecture/Nlxjw/intro-to-clustering Intro to Clustering]Coursera</ref> Clustering is one of the most used applications of unsupervised learning.
 == Motivation ==
@@ Line 13: / Line 13: @@
 Some divide clustering into two subgroups<ref>[https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/ An Introduction to Clustering and different methods of clustering]analyticsvidhya.com</ref>:
-* [[Hard clustering]]: Each data point either belongs to a cluster completely or not.
+* [[Hard clustering]]: Each data point either belongs to a cluster completely or not. Clusters do not overlap.
-* [[Soft clustering]]: A probability or likelihood is assigned for putting data points into separate clusters.
+* [[Soft clustering]]: A probability or likelihood is assigned for putting data points into separate clusters. Clusters may overlap.
 == Clustering vs classification ==