Clustering: Difference between revisions
No edit summary |
|||
| (13 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
'''Clustering''' is an [[unsupervised learning]] technique. It is used for grouping data points, or objects that are somehow similar. | '''Clustering''' is an [[unsupervised learning]] technique. It is used for grouping data points, or objects that are somehow similar. Clustering means finding clusters in a dataset, unsupervised.<ref name="MLPython">[https://www.coursera.org/learn/machine-learning-with-python/lecture/Nlxjw/intro-to-clustering Intro to Clustering]Coursera</ref> Clustering is one of the most used applications of unsupervised learning. | ||
== Motivation == | |||
Generally, clustering can be used for one of the following purposes<ref name="MLPython"/>: | |||
* [[Exploratory data analysis]] | |||
* Summary generation | |||
* [[Outlier detection]][https://www.youtube.com/watch?v=hGKY6BAqJ6o] | |||
* Finding duplicates | |||
* Pre-processing step | |||
== Types pof clustering == | == Types pof clustering == | ||
Some divide clustering into two subgroups<ref>[https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/ An Introduction to Clustering and different methods of clustering]analyticsvidhya.com</ref>: | Some divide clustering into two subgroups<ref>[https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/ An Introduction to Clustering and different methods of clustering]analyticsvidhya.com</ref>: | ||
* Hard clustering: Each data point either belongs to a cluster completely or not. | * [[Hard clustering]]: Each data point either belongs to a cluster completely or not. Clusters do not overlap. | ||
* Soft clustering: A probability or likelihood is assigned for putting data points into separate clusters. | * [[Soft clustering]]: A probability or likelihood is assigned for putting data points into separate clusters. Clusters may overlap. | ||
== Clustering vs classification == | |||
== Algorithms == | == Algorithms == | ||
| Line 11: | Line 22: | ||
Some of the commonly used clustering algorithms are<ref name="pythonistaplanet">[https://pythonistaplanet.com/applications-of-unsupervised-learning/ Real World Applications of Unsupervised Learning]pythonistaplanet.com</ref>: | Some of the commonly used clustering algorithms are<ref name="pythonistaplanet">[https://pythonistaplanet.com/applications-of-unsupervised-learning/ Real World Applications of Unsupervised Learning]pythonistaplanet.com</ref>: | ||
* [[K-means clustering|K-means]]: | * [[Partitioned-baseed clustering]]<ref name="MLPython"/>: | ||
** [[K-means clustering|K-means]]: | |||
** [[K-median]]: | |||
** [[Fuzzy c-means]]: | |||
* [[Hierarchical clustering]]<ref name="MLPython"/>: | |||
** [[Agglomerative clustering]]: | |||
** [[Divisive clustering]]: | |||
* [[Density-based clustering]]<ref name="MLPython"/>: | |||
** [[DBSCAN]]: | |||
=== others === | |||
* [[Expectation maximization]]: | * [[Expectation maximization]]: | ||
* [[Hierarchical cluster analysis]] (HCA): | * [[Hierarchical cluster analysis]] (HCA): | ||
== Applications == | |||
== External links == | |||
* [https://www.youtube.com/watch?v=xtDMHPVDDKk From Hard to Soft Clustering] [[wikipedia:Pavel A. Pevzner|Pavel A. Pevzner]] | |||
== References == | == References == | ||
Latest revision as of 02:08, 12 May 2022
Clustering is an unsupervised learning technique. It is used for grouping data points, or objects that are somehow similar. Clustering means finding clusters in a dataset, unsupervised.[1] Clustering is one of the most used applications of unsupervised learning.
Motivation
Generally, clustering can be used for one of the following purposes[1]:
- Exploratory data analysis
- Summary generation
- Outlier detection[1]
- Finding duplicates
- Pre-processing step
Types pof clustering
Some divide clustering into two subgroups[2]:
- Hard clustering: Each data point either belongs to a cluster completely or not. Clusters do not overlap.
- Soft clustering: A probability or likelihood is assigned for putting data points into separate clusters. Clusters may overlap.
Clustering vs classification
Algorithms
Some of the commonly used clustering algorithms are[3]:
others
Applications
External links
References
References
- ↑ 1.0 1.1 1.2 1.3 1.4 Intro to ClusteringCoursera
- ↑ An Introduction to Clustering and different methods of clusteringanalyticsvidhya.com
- ↑ Real World Applications of Unsupervised Learningpythonistaplanet.com