# NAG Toolbox: nag_mv_cluster_hier_indicator (g03ej)

## Purpose

nag_mv_cluster_hier_indicator (g03ej) computes a cluster indicator variable from the results of nag_mv_cluster_hier (g03ec).

## Syntax

[k, dlevel, ic, ifail] = g03ej(cd, iord, dord, k, dlevel, 'n', n)
[k, dlevel, ic, ifail] = nag_mv_cluster_hier_indicator(cd, iord, dord, k, dlevel, 'n', n)

## Description

Given a distance or dissimilarity matrix for n$n$ objects, cluster analysis aims to group the n$n$ objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods (see nag_mv_cluster_hier (g03ec)), a hierarchical tree is produced by starting with n$n$ clusters each with a single object and then at each of n1$n-1$ stages, merging two clusters to form a larger cluster until all objects are in a single cluster. nag_mv_cluster_hier_indicator (g03ej) takes the information from the tree and produces the clusters that exist at a given distance. This is equivalent to taking the dendrogram (see nag_mv_cluster_hier_dendrogram (g03eh)) and drawing a line across at a given distance to produce clusters.
As an alternative to giving the distance at which clusters are required, you can specify the number of clusters required and nag_mv_cluster_hier_indicator (g03ej) will compute the corresponding distance. However, it may not be possible to compute the number of clusters required due to ties in the distance matrix.
If there are k$k$ clusters then the indicator variable will assign a value between 1$1$ and k$k$ to each object to indicate to which cluster it belongs. Object 1$1$ always belongs to cluster 1$1$.

## References

Everitt B S (1974) Cluster Analysis Heinemann
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

## Parameters

### Compulsory Input Parameters

1:     cd(n1${\mathbf{n}}-1$) – double array
The clustering distances in increasing order as returned by nag_mv_cluster_hier (g03ec).
Constraint: cd(i + 1)cd(i)${\mathbf{cd}}\left(\mathit{i}+1\right)\ge {\mathbf{cd}}\left(\mathit{i}\right)$, for i = 1,2,,n2$\mathit{i}=1,2,\dots ,{\mathbf{n}}-2$.
2:     iord(n) – int64int32nag_int array
n, the dimension of the array, must satisfy the constraint n2${\mathbf{n}}\ge 2$.
The objects in dendrogram order as returned by nag_mv_cluster_hier (g03ec).
3:     dord(n) – double array
n, the dimension of the array, must satisfy the constraint n2${\mathbf{n}}\ge 2$.
The clustering distances corresponding to the order in iord.
4:     k – int64int32nag_int scalar
Indicates if a specified number of clusters is required.
If k > 0${\mathbf{k}}>0$ then nag_mv_cluster_hier_indicator (g03ej) will attempt to find k clusters.
If k0${\mathbf{k}}\le 0$ then nag_mv_cluster_hier_indicator (g03ej) will find the clusters based on the distance given in dlevel.
Constraint: kn${\mathbf{k}}\le {\mathbf{n}}$.
5:     dlevel – double scalar
If k0${\mathbf{k}}\le 0$, dlevel must contain the distance at which clusters are produced. Otherwise dlevel need not be set.
Constraint: if dlevel > 0.0${\mathbf{dlevel}}>0.0$, k0${\mathbf{k}}\le 0$.

### Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the arrays iord, dord. (An error is raised if these dimensions are not equal.)
n$n$, the number of objects.
Constraint: n2${\mathbf{n}}\ge 2$.

None.

### Output Parameters

1:     k – int64int32nag_int scalar
The number of clusters produced, k$k$.
2:     dlevel – double scalar
If k > 0${\mathbf{k}}>0$ on entry, dlevel contains the distance at which the required number of clusters are found. Otherwise dlevel remains unchanged.
3:     ic(n) – int64int32nag_int array
ic(i)${\mathbf{ic}}\left(\mathit{i}\right)$ indicates to which of k$k$ clusters the i$\mathit{i}$th object belongs, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
4:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

ifail = 1${\mathbf{ifail}}=1$
 On entry, k > n${\mathbf{k}}>{\mathbf{n}}$, or k ≤ 0${\mathbf{k}}\le 0$ and dlevel ≤ 0.0${\mathbf{dlevel}}\le 0.0$. or n < 2${\mathbf{n}}<2$.
ifail = 2${\mathbf{ifail}}=2$
 On entry, cd is not in increasing order, or dord is incompatible with cd.
ifail = 3${\mathbf{ifail}}=3$
 On entry, k = 1${\mathbf{k}}=1$, or k = n${\mathbf{k}}={\mathbf{n}}$, or dlevel ≥ cd(n − 1)${\mathbf{dlevel}}\ge {\mathbf{cd}}\left({\mathbf{n}}-1\right)$, or dlevel < cd(1)${\mathbf{dlevel}}<{\mathbf{cd}}\left(1\right)$.
Note:  on exit with this value of ifail the trivial clustering solution is returned.
W ifail = 4${\mathbf{ifail}}=4$
The precise number of clusters requested is not possible because of tied clustering distances. The actual number of clusters, less than the number requested, is returned in k.

## Accuracy

The accuracy will depend upon the accuracy of the distances in cd and dord (see nag_mv_cluster_hier (g03ec)).

A fixed number of clusters can be found using the non-hierarchical method used in nag_mv_cluster_kmeans (g03ef).

## Example

```function nag_mv_cluster_hier_indicator_example
cd = [1;
2;
6.5;
14.125];
iord = [int64(1);3;5;2;4];
dord = [2;
6.5;
14.125;
1;
14.125];
k = int64(2);
dlevel = 0;
[kOut, dlevelOut, ic, ifail] = nag_mv_cluster_hier_indicator(cd, iord, dord, k, dlevel)
```
```

kOut =

2

dlevelOut =

6.5000

ic =

1
2
1
2
1

ifail =

0

```
