Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_mv_discrim_mahal (g03db)

## Purpose

nag_mv_discrim_mahal (g03db) computes Mahalanobis squared distances for group or pooled variance-covariance matrices. It is intended for use after nag_mv_discrim (g03da).

## Syntax

[d, ifail] = g03db(equal, mode, gmn, gc, nobs, isx, x, 'nvar', nvar, 'ng', ng, 'm', m)
[d, ifail] = nag_mv_discrim_mahal(equal, mode, gmn, gc, nobs, isx, x, 'nvar', nvar, 'ng', ng, 'm', m)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 22: ng has been made optional
.

## Description

Consider p$p$ variables observed on ng${n}_{g}$ populations or groups. Let xj${\stackrel{-}{x}}_{j}$ be the sample mean and Sj${S}_{j}$ the within-group variance-covariance matrix for the j$j$th group and let xk${x}_{k}$ be the k$k$th sample point in a dataset. A measure of the distance of the point from the j$j$th population or group is given by the Mahalanobis distance, Dkj${D}_{kj}$:
 Dkj2 = (xk − xj)TSj − 1(xk − xj). $Dkj2=(xk-x-j)TSj-1(xk-x-j).$
If the pooled estimated of the variance-covariance matrix S$S$ is used rather than the within-group variance-covariance matrices, then the distance is:
 Dkj2 = (xk − xj)TS − 1(xk − xj). $Dkj2=(xk-x-j)TS-1(xk-x-j).$
Instead of using the variance-covariance matrices S$S$ and Sj${S}_{j}$, nag_mv_discrim_mahal (g03db) uses the upper triangular matrices R$R$ and Rj${R}_{j}$ supplied by nag_mv_discrim (g03da) such that S = RTR$S={R}^{\mathrm{T}}R$ and Sj = RjTRj${S}_{j}={R}_{j}^{\mathrm{T}}{R}_{j}$. Dkj2${D}_{kj}^{2}$ can then be calculated as zTz${z}^{\mathrm{T}}z$ where Rjz = (xkxj)${R}_{j}z=\left({x}_{k}-{\stackrel{-}{x}}_{j}\right)$ or Rz = (xkxj)$Rz=\left({x}_{k}-{\stackrel{-}{x}}_{j}\right)$ as appropriate.
A particular case is when the distance between the group or population means is to be estimated. The Mahalanobis squared distance between the i$i$th and j$j$th groups is:
 Dij2 = (xi − xj)TSj − 1(xi − xj) $Dij2=(x-i-x-j)TSj-1(x-i-x-j)$
or
 Dij2 = (xi − xj)TS − 1(xi − xj). $Dij2=(x-i-x-j)TS-1(x-i-x-j).$
Note:  Djj2 = 0${D}_{jj}^{2}=0$ and that in the case when the pooled variance-covariance matrix is used Dij2 = Dji2${D}_{ij}^{2}={D}_{ji}^{2}$ so in this case only the lower triangular values of Dij2${D}_{ij}^{2}$, i > j$i>j$, are computed.

## References

Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

## Parameters

### Compulsory Input Parameters

1:     equal – string (length ≥ 1)
Indicates whether or not the within-group variance-covariance matrices are assumed to be equal and the pooled variance-covariance matrix used.
equal = 'E'${\mathbf{equal}}=\text{'E'}$
The within-group variance-covariance matrices are assumed equal and the matrix R$R$ stored in the first p(p + 1) / 2$p\left(p+1\right)/2$ elements of gc is used.
equal = 'U'${\mathbf{equal}}=\text{'U'}$
The within-group variance-covariance matrices are assumed to be unequal and the matrices Rj${R}_{\mathit{j}}$, for j = 1,2,,ng$\mathit{j}=1,2,\dots ,{n}_{g}$, stored in the remainder of gc are used.
Constraint: equal = 'E'${\mathbf{equal}}=\text{'E'}$ or 'U'$\text{'U'}$.
2:     mode – string (length ≥ 1)
Indicates whether distances from sample points are to be calculated or distances between the group means.
mode = 'S'${\mathbf{mode}}=\text{'S'}$
The distances between the sample points given in x and the group means are calculated.
mode = 'M'${\mathbf{mode}}=\text{'M'}$
The distances between the group means will be calculated.
Constraint: mode = 'M'${\mathbf{mode}}=\text{'M'}$ or 'S'$\text{'S'}$.
3:     gmn(ldgmn,nvar) – double array
ldgmn, the first dimension of the array, must satisfy the constraint ldgmnng$\mathit{ldgmn}\ge {\mathbf{ng}}$.
The j$\mathit{j}$th row of gmn contains the means of the p$p$ selected variables for the j$\mathit{j}$th group, for j = 1,2,,ng$\mathit{j}=1,2,\dots ,{n}_{g}$. These are returned by nag_mv_discrim (g03da).
4:     gc((ng + 1) × nvar × (nvar + 1) / 2$\left({\mathbf{ng}}+1\right)×{\mathbf{nvar}}×\left({\mathbf{nvar}}+1\right)/2$) – double array
The first p(p + 1) / 2$p\left(p+1\right)/2$ elements of gc should contain the upper triangular matrix R$R$ and the next ng${n}_{g}$ blocks of p(p + 1) / 2$p\left(p+1\right)/2$ elements should contain the upper triangular matrices Rj${R}_{j}$. All matrices must be stored packed by column. These matrices are returned by nag_mv_discrim (g03da). If equal = 'E'${\mathbf{equal}}=\text{'E'}$ only the first p(p + 1) / 2$p\left(p+1\right)/2$ elements are referenced, if equal = 'U'${\mathbf{equal}}=\text{'U'}$ only the elements p(p + 1) / 2 + 1$p\left(p+1\right)/2+1$ to (ng + 1)p(p + 1) / 2$\left({n}_{g}+1\right)p\left(p+1\right)/2$ are referenced.
Constraints:
• if equal = 'E'${\mathbf{equal}}=\text{'E'}$, R0.0$R\ne 0.0$;
• if equal = 'U'${\mathbf{equal}}=\text{'U'}$, the diagonal elements of the Rj0.0${R}_{\mathit{j}}\ne 0.0$, for j = 1,2,,ng$\mathit{j}=1,2,\dots ,{\mathbf{ng}}$.
5:     nobs – int64int32nag_int scalar
If mode = 'S'${\mathbf{mode}}=\text{'S'}$, the number of sample points in x for which distances are to be calculated.
If mode = 'M'${\mathbf{mode}}=\text{'M'}$, nobs is not referenced.
Constraint: if nobs1${\mathbf{nobs}}\ge 1$, mode = 'S'${\mathbf{mode}}=\text{'S'}$.
6:     isx( : $:$) – int64int32nag_int array
Note: the dimension of the array isx must be at least max (1,m)$\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}\right)$.
If mode = 'S'${\mathbf{mode}}=\text{'S'}$, isx(l)${\mathbf{isx}}\left(\mathit{l}\right)$ indicates if the l$\mathit{l}$th variable in x is to be included in the distance calculations. If isx(l) > 0${\mathbf{isx}}\left(\mathit{l}\right)>0$ the l$\mathit{l}$th variable is included, for l = 1,2,,m$\mathit{l}=1,2,\dots ,{\mathbf{m}}$; otherwise the l$\mathit{l}$th variable is not referenced.
If mode = 'M'${\mathbf{mode}}=\text{'M'}$, isx is not referenced.
Constraint: if mode = 'S'${\mathbf{mode}}=\text{'S'}$, isx(l) > 0${\mathbf{isx}}\left(l\right)>0$ for nvar values of l$l$.
7:     x(ldx, : $:$) – double array
The first dimension, ldx, of the array x must satisfy
• if mode = 'S'${\mathbf{mode}}=\text{'S'}$, ldxnobs$\mathit{ldx}\ge {\mathbf{nobs}}$;
• otherwise 1$1$.
The second dimension of the array must be at least max (1,m)$\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}\right)$
If mode = 'S'${\mathbf{mode}}=\text{'S'}$ the k$\mathit{k}$th row of x must contain xk${x}_{\mathit{k}}$. That is x(k,l)${\mathbf{x}}\left(\mathit{k},\mathit{l}\right)$ must contain the k$\mathit{k}$th sample value for the l$\mathit{l}$th variable, for k = 1,2,,nobs$\mathit{k}=1,2,\dots ,{\mathbf{nobs}}$ and l = 1,2,,m$\mathit{l}=1,2,\dots ,{\mathbf{m}}$. Otherwise x is not referenced.

### Optional Input Parameters

1:     nvar – int64int32nag_int scalar
Default: The second dimension of the array gmn.
p$p$, the number of variables in the variance-covariance matrices as specified to nag_mv_discrim (g03da).
Constraint: nvar1${\mathbf{nvar}}\ge 1$.
2:     ng – int64int32nag_int scalar
Default: The first dimension of the array gmn.
The number of groups, ng${n}_{g}$.
Constraint: ng2${\mathbf{ng}}\ge 2$.
3:     m – int64int32nag_int scalar
Default: The dimension of the arrays isx, x.
If mode = 'S'${\mathbf{mode}}=\text{'S'}$, the number of variables in the data array x.
If mode = 'M'${\mathbf{mode}}=\text{'M'}$, m is not referenced.
Constraint: if ${\mathbf{m}}\ge {\mathbf{nvar}}$, mode = 'S'${\mathbf{mode}}=\text{'S'}$.

ldgmn ldx ldd wk

### Output Parameters

1:     d(ldd,ng) – double array
The squared distances.
If mode = 'S'${\mathbf{mode}}=\text{'S'}$, d(k,j)${\mathbf{d}}\left(\mathit{k},\mathit{j}\right)$ contains the squared distance of the k$\mathit{k}$th sample point from the j$\mathit{j}$th group mean, Dkj2${D}_{\mathit{k}\mathit{j}}^{2}$, for k = 1,2,,nobs$\mathit{k}=1,2,\dots ,{\mathbf{nobs}}$ and j = 1,2,,ng$\mathit{j}=1,2,\dots ,{n}_{g}$.
If mode = 'M'${\mathbf{mode}}=\text{'M'}$ and equal = 'U'${\mathbf{equal}}=\text{'U'}$, d(i,j)${\mathbf{d}}\left(\mathit{i},\mathit{j}\right)$ contains the squared distance between the i$\mathit{i}$th mean and the j$\mathit{j}$th mean, Dij2${D}_{\mathit{i}\mathit{j}}^{2}$, for i = 1,2,,ng$\mathit{i}=1,2,\dots ,{n}_{g}$ and j = 1,2,,i1,i + 1,,ng$\mathit{j}=1,2,\dots ,\mathit{i}-1,\mathit{i}+1,\dots ,{n}_{g}$. The elements d(i,i)${\mathbf{d}}\left(\mathit{i},\mathit{i}\right)$ are not referenced, for i = 1,2,,ng$\mathit{i}=1,2,\dots ,{n}_{g}$.
If mode = 'M'${\mathbf{mode}}=\text{'M'}$ and equal = 'E'${\mathbf{equal}}=\text{'E'}$, d(i,j)${\mathbf{d}}\left(\mathit{i},\mathit{j}\right)$ contains the squared distance between the i$\mathit{i}$th mean and the j$\mathit{j}$th mean, Dij2${D}_{\mathit{i}\mathit{j}}^{2}$, for i = 1,2,,ng$\mathit{i}=1,2,\dots ,{n}_{g}$ and j = 1,2,,i1$\mathit{j}=1,2,\dots ,\mathit{i}-1$. Since Dij = Dji${D}_{\mathit{i}\mathit{j}}={D}_{\mathit{j}\mathit{i}}$ the elements d(i,j)${\mathbf{d}}\left(\mathit{i},\mathit{j}\right)$ are not referenced, for i = 1,2,,ng$\mathit{i}=1,2,\dots ,{n}_{g}$ and j = i + 1,,ng$\mathit{j}=\mathit{i}+1,\dots ,{n}_{g}$.
2:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Errors or warnings detected by the function:
ifail = 1${\mathbf{ifail}}=1$
 On entry, nvar < 1${\mathbf{nvar}}<1$, or ng < 2${\mathbf{ng}}<2$, or ldgmn < ng$\mathit{ldgmn}<{\mathbf{ng}}$, or mode = 'S'${\mathbf{mode}}=\text{'S'}$ and nobs < 1${\mathbf{nobs}}<1$, or mode = 'S'${\mathbf{mode}}=\text{'S'}$ and ${\mathbf{m}}<{\mathbf{nvar}}$, or mode = 'S'${\mathbf{mode}}=\text{'S'}$ and ldx < nobs$\mathit{ldx}<{\mathbf{nobs}}$, or mode = 'S'${\mathbf{mode}}=\text{'S'}$ and ldd < nobs$\mathit{ldd}<{\mathbf{nobs}}$, or mode = 'M'${\mathbf{mode}}=\text{'M'}$ and ldd < ng$\mathit{ldd}<{\mathbf{ng}}$, or equal ≠ 'E'${\mathbf{equal}}\ne \text{'E'}$ or ‘U’, or mode ≠ 'M'${\mathbf{mode}}\ne \text{'M'}$ or ‘S’.
ifail = 2${\mathbf{ifail}}=2$
 On entry, mode = 'S'${\mathbf{mode}}=\text{'S'}$ and the number of variables indicated by isx is not equal to nvar, or equal = 'E'${\mathbf{equal}}=\text{'E'}$ and a diagonal element of R$R$ is zero, or equal = 'U'${\mathbf{equal}}=\text{'U'}$ and a diagonal element of Rj${R}_{j}$ for some j$j$ is zero.

## Accuracy

The accuracy will depend upon the accuracy of the input R$R$ or Rj${R}_{j}$ matrices.

## Further Comments

If the distances are to be used for discrimination, see also nag_mv_discrim_group (g03dc).

## Example

```function nag_mv_discrim_mahal_example
equal = 'U';
mode = 'Sample points';
gmean = [1.0433, -0.603417;
2.00727, -0.20604;
2.70974, 1.5998];
gc = [-0.5099642881287538;
-0.279705472386133;
-1.217327847040481;
-0.3326727521153484;
-0.3723518779712077;
-1.987589395382754;
-0.4603014906920608;
-0.7041634974247672;
0.4737334252803499;
0.7451327720614629;
-0.3251057349548681;
-0.4275545007358186];
nobs = int64(6);
isx = [int64(1);1];
x = [1.6292, -0.9163;
2.5572, 1.6094;
2.5649, -0.2231;
0.9555, -2.3026;
3.4012, -2.3026;
3.0204, -0.2231];
[d, ifail] = nag_mv_discrim_mahal(equal, mode, gmean, gc, nobs, isx, x)
```
```

d =

3.3393    0.7521   50.9283
20.7771    5.6559    0.0597
21.3631    4.8411   19.4978
0.7184    6.2803  124.7323
55.0003   88.8604   71.7852
36.1703   15.7849   15.7489

ifail =

0

```
```function g03db_example
equal = 'U';
mode = 'Sample points';
gmean = [1.0433, -0.603417;
2.00727, -0.20604;
2.70974, 1.5998];
gc = [-0.5099642881287538;
-0.279705472386133;
-1.217327847040481;
-0.3326727521153484;
-0.3723518779712077;
-1.987589395382754;
-0.4603014906920608;
-0.7041634974247672;
0.4737334252803499;
0.7451327720614629;
-0.3251057349548681;
-0.4275545007358186];
nobs = int64(6);
isx = [int64(1);1];
x = [1.6292, -0.9163;
2.5572, 1.6094;
2.5649, -0.2231;
0.9555, -2.3026;
3.4012, -2.3026;
3.0204, -0.2231];
[d, ifail] = g03db(equal, mode, gmean, gc, nobs, isx, x)
```
```

d =

3.3393    0.7521   50.9283
20.7771    5.6559    0.0597
21.3631    4.8411   19.4978
0.7184    6.2803  124.7323
55.0003   88.8604   71.7852
36.1703   15.7849   15.7489

ifail =

0

```

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013