Clustering: Difference between revisions

From Machinelearning
(Created page with "'''Clustering''' is an unsupervised learning technique. It is used for grouping data points, or objects that are somehow similar.")
 
No edit summary
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Clustering''' is an [[unsupervised learning]] technique. It is used for grouping data points, or objects that are somehow similar.
'''Clustering''' is an [[unsupervised learning]] technique. It is used for grouping data points, or objects that are somehow similar. Clustering means finding clusters in a dataset, unsupervised.<ref name="MLPython">[https://www.coursera.org/learn/machine-learning-with-python/lecture/Nlxjw/intro-to-clustering Intro to Clustering]Coursera</ref> Clustering is one of the most used applications of unsupervised learning.
 
== Motivation ==
 
Generally, clustering can be used for one of the following purposes<ref name="MLPython"/>:
* [[Exploratory data analysis]]
* Summary generation
* [[Outlier detection]][https://www.youtube.com/watch?v=hGKY6BAqJ6o]
* Finding duplicates
* Pre-processing step
 
== Types pof clustering ==
 
Some divide clustering into two subgroups<ref>[https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/ An Introduction to Clustering and different methods of clustering]analyticsvidhya.com</ref>:
* [[Hard clustering]]: Each data point either belongs to a cluster completely or not. Clusters do not overlap.
* [[Soft clustering]]: A probability or likelihood is assigned for putting data points into separate clusters. Clusters may overlap.
 
== Clustering vs classification ==
 
== Algorithms ==
 
Some of the commonly used clustering algorithms are<ref name="pythonistaplanet">[https://pythonistaplanet.com/applications-of-unsupervised-learning/ Real World Applications of Unsupervised Learning]pythonistaplanet.com</ref>:
 
* [[Partitioned-baseed clustering]]<ref name="MLPython"/>:
** [[K-means clustering|K-means]]:
** [[K-median]]:
** [[Fuzzy c-means]]:
* [[Hierarchical clustering]]<ref name="MLPython"/>:
** [[Agglomerative clustering]]:
** [[Divisive clustering]]:
* [[Density-based clustering]]<ref name="MLPython"/>:
** [[DBSCAN]]:
 
 
=== others ===
 
* [[Expectation maximization]]:
* [[Hierarchical cluster analysis]] (HCA):
 
== Applications ==
 
== External links ==
 
* [https://www.youtube.com/watch?v=xtDMHPVDDKk From Hard to Soft Clustering] [[wikipedia:Pavel A. Pevzner|Pavel A. Pevzner]]
 
== References ==
 
* [https://datafloq.com/read/7-innovative-uses-of-clustering-algorithms/6224]
* [https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/]
* [https://www.coursera.org/lecture/ml-foundations/other-examples-of-clustering-cmh30]
 
== References ==

Latest revision as of 02:08, 12 May 2022

Clustering is an unsupervised learning technique. It is used for grouping data points, or objects that are somehow similar. Clustering means finding clusters in a dataset, unsupervised.[1] Clustering is one of the most used applications of unsupervised learning.

Motivation

Generally, clustering can be used for one of the following purposes[1]:

Types pof clustering

Some divide clustering into two subgroups[2]:

  • Hard clustering: Each data point either belongs to a cluster completely or not. Clusters do not overlap.
  • Soft clustering: A probability or likelihood is assigned for putting data points into separate clusters. Clusters may overlap.

Clustering vs classification

Algorithms

Some of the commonly used clustering algorithms are[3]:


others

Applications

External links

References

References