Different topics

Briefly about Cluster Analysis

The concept

Assume a data set defined in two dimensions, as presented below:


Let the horizontal and vertical distribution represent the values along the two dimensions. The relations between data a to f could be reflected according to Euclidean distance measure in one dimensions as shown below,

reflecting the hierarchical clustering between the data in terms of measured distances.

Distances

Assume the following two data sets (x and y):

 and   

The distance between the two datasets could in cluster analysis be calculated on the basis of a number of measuring methods. The most important methods are the following:

Euclidean distance

The Euclidean distance of the above data set is


Squared Euclidean distance

The Squared Euclidean distance between the above data sets is


Manhattan distance

The Manhattan distance of the above data set is


Chebychev or Chessboard distance

distance(x, y) = max(abs(xi-yi))

The Chessboard distance between the above data sets is


An numerical example:
If = 1, = 2, = 3, = 4, = 5, = 6, then the Chessboard distance is 3.

Bray-Curtis distance


The Bray-Curtis distance between the above data sets is


Canberra distance


The Canberra distance between the above data sets is