NAG Library Routine Document
G03EJF computes a cluster indicator variable from the results of G03ECF
||N, IORD(N), K, IC(N), IFAIL
||CD(N-1), DORD(N), DLEVEL
Given a distance or dissimilarity matrix for
objects, cluster analysis aims to group the
objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods (see G03ECF
), a hierarchical tree is produced by starting with
clusters each with a single object and then at each of
stages, merging two clusters to form a larger cluster until all objects are in a single cluster. G03EJF takes the information from the tree and produces the clusters that exist at a given distance. This is equivalent to taking the dendrogram (see G03EHF
) and drawing a line across at a given distance to produce clusters.
As an alternative to giving the distance at which clusters are required, you can specify the number of clusters required and G03EJF will compute the corresponding distance. However, it may not be possible to compute the number of clusters required due to ties in the distance matrix.
If there are clusters then the indicator variable will assign a value between and to each object to indicate to which cluster it belongs. Object always belongs to cluster .
Everitt B S (1974) Cluster Analysis Heinemann
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
- 1: N – INTEGERInput
On entry: , the number of objects.
- 2: CD() – REAL (KIND=nag_wp) arrayInput
: the clustering distances in increasing order as returned by G03ECF
, for .
- 3: IORD(N) – INTEGER arrayInput
: the objects in dendrogram order as returned by G03ECF
- 4: DORD(N) – REAL (KIND=nag_wp) arrayInput
: the clustering distances corresponding to the order in IORD
- 5: K – INTEGERInput/Output
: indicates if a specified number of clusters is required.
then G03EJF will attempt to find K
then G03EJF will find the clusters based on the distance given in DLEVEL
On exit: the number of clusters produced, .
- 6: DLEVEL – REAL (KIND=nag_wp)Input/Output
must contain the distance at which clusters are produced. Otherwise DLEVEL
need not be set.
if , .
on entry, DLEVEL
contains the distance at which the required number of clusters are found. Otherwise DLEVEL
- 7: IC(N) – INTEGER arrayOutput
On exit: indicates to which of clusters the th object belongs, for .
- 8: IFAIL – INTEGERInput/Output
must be set to
. If you are unfamiliar with this parameter you should refer to Section 3.3
in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
is recommended. If the output of error messages is undesirable, then the value
is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is
. When the value is used it is essential to test the value of IFAIL on exit.
unless the routine detects an error or a warning has been flagged (see Section 6
6 Error Indicators and Warnings
If on entry
, explanatory error messages are output on the current error message unit (as defined by X04AAF
Errors or warnings detected by the routine:
|or|| and .|
|On entry,||CD is not in increasing order,|
|or||DORD is incompatible with CD.|
on exit with this value of IFAIL
the trivial clustering solution is returned.
The precise number of clusters requested is not possible because of tied clustering distances. The actual number of clusters, less than the number requested, is returned in K
The accuracy will depend upon the accuracy of the distances in CD
A fixed number of clusters can be found using the non-hierarchical method used in G03EFF
Data consisting of three variables on five objects are input. Euclidean squared distances are computed using G03EAF
and median clustering performed using G03ECF
. A dendrogram is produced by G03EHF
and printed. G03EJF finds two clusters and the results are printed.
9.1 Program Text
Program Text (g03ejfe.f90)
9.2 Program Data
Program Data (g03ejfe.d)
9.3 Program Results
Program Results (g03ejfe.r)