Briefly about Cluster Analysis
The concept
Assume a data set defined in two dimensions, as presented below:
Let the horizontal and vertical distribution represent the values along the two dimensions. The relations between data a to f could be reflected according to Euclidean distance measure in one dimensions as shown below,
reflecting the hierarchical clustering between the data in terms of measured distances.
Distances
Assume the following two data sets (x and y):
The distance between the two datasets could in cluster analysis be calculated on the basis of a number of measuring methods. The most important methods are the following:
Euclidean distance
Squared Euclidean distance
Manhattan distance
Chebychev or Chessboard distance
distance(x, y) = max(abs(xi-yi))
The Chessboard distance between the above data sets is
An numerical example:
If a = 1, b = 2, c = 3, d = 4, e = 5, f = 6, then the Chessboard distance is 3.
Bray-Curtis distance
The Bray-Curtis distance between the above data sets is