Hierarchical clustering: Difference between revisions

From Machinelearning
No edit summary
No edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
'''Hierarchical clustering''', also known as hierarchical cluster analysis, is an algorithm that groups similar objects into groups called clusters.<ref>[https://www.displayr.com/what-is-hierarchical-clustering/ What is Hierarchical Clustering?]displayr.com</ref> Hierarchical clustering is often associated with heatmaps.<ref>[https://www.youtube.com/watch?v=7xHsRkOdVwo StatQuest: Hierarchical Clustering]youtube.com</ref>
'''Hierarchical clustering''', also known as hierarchical cluster analysis, is a [[clustering]] algorithm that groups similar objects into groups called clusters.<ref>[https://www.displayr.com/what-is-hierarchical-clustering/ What is Hierarchical Clustering?]displayr.com</ref> Hierarchical clustering is often associated with heatmaps.<ref>[https://www.youtube.com/watch?v=7xHsRkOdVwo StatQuest: Hierarchical Clustering]youtube.com</ref>
   
   
== Strategies ==
== Strategies ==
Strategies for hierarchical clustering generally fall into two types<ref name="intro">[https://www.coursera.org/learn/machine-learning-with-python/lecture/cHku3/intro-to-hierarchical-clustering Intro to Hierarchical Clustering]Coursera</ref>:
Strategies for hierarchical clustering generally fall into two types<ref name="intro">[https://www.coursera.org/learn/machine-learning-with-python/lecture/cHku3/intro-to-hierarchical-clustering Intro to Hierarchical Clustering]Coursera</ref>:
* Divisive: This type works on the assumption that all the feature vectors form a single set and then hierarchically go on dividing this group into different sets.<ref name="ssa">[https://www.sciencedirect.com/topics/computer-science/divisive-clustering Divisive Clustering]sciencedirect.com</ref>  
* Divisive: This type works on the assumption that all the feature vectors form a single set and then hierarchically go on dividing this group into different sets.<ref name="ssa">[https://www.sciencedirect.com/topics/computer-science/divisive-clustering Divisive Clustering]sciencedirect.com</ref>  
* Agglomerative: In this type partitions are visualized using a tree structure called [[dendrogram]].<ref name="ssa"/>
* Agglomerative: The idea is to ensure nearby points end up in the same cluster.<ref>[https://www.youtube.com/watch?v=XJ3194AmH40 Agglomerative Clustering: how it works]youtube.com</ref> In this type, partitions are visualized using a tree structure called [[dendrogram]].<ref name="ssa"/>


== Advantages vs disadvantages<ref>[https://www.coursera.org/learn/machine-learning-with-python/lecture/h2iAD/more-on-hierarchical-clustering Hierarchical Clustering]Coursera</ref> ==
== Advantages vs disadvantages<ref>[https://www.coursera.org/learn/machine-learning-with-python/lecture/h2iAD/more-on-hierarchical-clustering Hierarchical Clustering]Coursera</ref> ==

Latest revision as of 15:22, 14 April 2020

Hierarchical clustering, also known as hierarchical cluster analysis, is a clustering algorithm that groups similar objects into groups called clusters.[1] Hierarchical clustering is often associated with heatmaps.[2]

Strategies

Strategies for hierarchical clustering generally fall into two types[3]:

  • Divisive: This type works on the assumption that all the feature vectors form a single set and then hierarchically go on dividing this group into different sets.[4]
  • Agglomerative: The idea is to ensure nearby points end up in the same cluster.[5] In this type, partitions are visualized using a tree structure called dendrogram.[4]

Advantages vs disadvantages[6]

Advantages

  • Hierarchical clustering does not require the number of clusters to be specified.
  • It is easy to implement.
  • Hierarchical clustering produces a dendogram, which hlps with understanding the data.

Disadvantages

  • The hierarchical algorithm can never do any previous steps throughout the algorithm
  • The time complexity for the clustering can result in very long computation times in comparison with efficient algorithms such as K-means.
  • If we have a large data set, it can become difficult to determine the correct number of clusters by the dendrogram.

References