Clustering: Difference between revisions

From Machinelearning
No edit summary
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Clustering''' is an [[unsupervised learning]] technique. It is used for grouping data points, or objects that are somehow similar.
'''Clustering''' is an [[unsupervised learning]] technique. It is used for grouping data points, or objects that are somehow similar. Clustering means finding clusters in a dataset, unsupervised.<ref name="MLPython">[https://www.coursera.org/learn/machine-learning-with-python/lecture/Nlxjw/intro-to-clustering Intro to Clustering]Coursera</ref> Clustering is one of the most used applications of unsupervised learning.
 
== Motivation ==
 
Generally, clustering can be used for one of the following purposes<ref name="MLPython"/>:
* [[Exploratory data analysis]]
* Summary generation
* [[Outlier detection]][https://www.youtube.com/watch?v=hGKY6BAqJ6o]
* Finding duplicates
* Pre-processing step


== Types pof clustering ==
== Types pof clustering ==


Some divide clustering into two subgroups<ref>[https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/ An Introduction to Clustering and different methods of clustering]analyticsvidhya.com</ref>:
Some divide clustering into two subgroups<ref>[https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/ An Introduction to Clustering and different methods of clustering]analyticsvidhya.com</ref>:
* Hard clustering: Each data point either belongs to a cluster completely or not.  
* [[Hard clustering]]: Each data point either belongs to a cluster completely or not. Clusters do not overlap.
* Soft clustering: A probability or likelihood is assigned for putting data points into separate clusters.
* [[Soft clustering]]: A probability or likelihood is assigned for putting data points into separate clusters. Clusters may overlap.
 
== Clustering vs classification ==


== Algorithms ==
== Algorithms ==
Line 11: Line 22:
Some of the commonly used clustering algorithms are<ref name="pythonistaplanet">[https://pythonistaplanet.com/applications-of-unsupervised-learning/ Real World Applications of Unsupervised Learning]pythonistaplanet.com</ref>:
Some of the commonly used clustering algorithms are<ref name="pythonistaplanet">[https://pythonistaplanet.com/applications-of-unsupervised-learning/ Real World Applications of Unsupervised Learning]pythonistaplanet.com</ref>:


* [[K-means clustering|K-means]]:
* [[Partitioned-baseed clustering]]<ref name="MLPython"/>:
** [[K-means clustering|K-means]]:
** [[K-median]]:
** [[Fuzzy c-means]]:
* [[Hierarchical clustering]]<ref name="MLPython"/>:
** [[Agglomerative clustering]]:
** [[Divisive clustering]]:
* [[Density-based clustering]]<ref name="MLPython"/>:
** [[DBSCAN]]:
 
 
=== others ===
 
* [[Expectation maximization]]:
* [[Expectation maximization]]:
* [[Hierarchical cluster analysis]] (HCA):  
* [[Hierarchical cluster analysis]] (HCA):
 
== Applications ==
 
== External links ==
 
* [https://www.youtube.com/watch?v=xtDMHPVDDKk From Hard to Soft Clustering] [[wikipedia:Pavel A. Pevzner|Pavel A. Pevzner]]


== References ==
== References ==

Latest revision as of 02:08, 12 May 2022

Clustering is an unsupervised learning technique. It is used for grouping data points, or objects that are somehow similar. Clustering means finding clusters in a dataset, unsupervised.[1] Clustering is one of the most used applications of unsupervised learning.

Motivation

Generally, clustering can be used for one of the following purposes[1]:

Types pof clustering

Some divide clustering into two subgroups[2]:

  • Hard clustering: Each data point either belongs to a cluster completely or not. Clusters do not overlap.
  • Soft clustering: A probability or likelihood is assigned for putting data points into separate clusters. Clusters may overlap.

Clustering vs classification

Algorithms

Some of the commonly used clustering algorithms are[3]:


others

Applications

External links

References

References