Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_correg_robustm_corr_huber (g02hk)

## Purpose

nag_correg_robustm_corr_huber (g02hk) computes a robust estimate of the covariance matrix for an expected fraction of gross errors.

## Syntax

[cov, theta, nit, ifail] = g02hk(x, eps, 'n', n, 'm', m, 'maxit', maxit, 'nitmon', nitmon, 'tol', tol)
[cov, theta, nit, ifail] = nag_correg_robustm_corr_huber(x, eps, 'n', n, 'm', m, 'maxit', maxit, 'nitmon', nitmon, 'tol', tol)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 22: n has been made optional
Mark 23: nitmon, tol now optional
.

## Description

For a set of n$n$ observations on m$m$ variables in a matrix X$X$, a robust estimate of the covariance matrix, C$C$, and a robust estimate of location, θ$\theta$, are given by
 C = τ2(ATA) − 1, $C=τ2(ATA)-1,$
where τ2${\tau }^{2}$ is a correction factor and A$A$ is a lower triangular matrix found as the solution to the following equations:
 zi = A(xi − θ), $zi=A(xi-θ),$
 n 1/n ∑ w(‖zi‖2)zi = 0, i = 1
$1n ∑i= 1nw(‖zi‖2)zi=0,$
and
 n 1/n ∑ u(‖zi‖2)ziziT − I = 0, i = 1
$1n∑i=1nu(‖zi‖2)zi ziT -I=0,$
 where xi${x}_{i}$ is a vector of length m$m$ containing the elements of the i$i$th row of x, zi${z}_{i}$ is a vector of length m$m$, I$I$ is the identity matrix and 0$0$ is the zero matrix, and w$w$ and u$u$ are suitable functions.
nag_correg_robustm_corr_huber (g02hk) uses weight functions:
 u(t) = (au)/(t2), if ​t < au2 u(t) = 1, if ​au2 ≤ t ≤ bu2 u(t) = (bu)/(t2), if ​t > bu2
$u(t)= aut2, if ​tbu2$
and
 w(t) = 1, if ​t ≤ cw w(t) = (cw)/t, if ​t > cw
$w(t)= 1, if ​t≤cw w(t)= cwt, if ​t>cw$
for constants au${a}_{u}$, bu${b}_{u}$ and cw${c}_{w}$.
These functions solve a minimax problem considered by Huber (see Huber (1981)). The values of au${a}_{u}$, bu${b}_{u}$ and cw${c}_{w}$ are calculated from the expected fraction of gross errors, ε$\epsilon$ (see Huber (1981) and Marazzi (1987)). The expected fraction of gross errors is the estimated proportion of outliers in the sample.
In order to make the estimate asymptotically unbiased under a Normal model a correction factor, τ2${\tau }^{2}$, is calculated, (see Huber (1981) and Marazzi (1987)).
The matrix C$C$ is calculated using nag_correg_robustm_corr_user_deriv (g02hl). Initial estimates of θj${\theta }_{j}$, for j = 1,2,,m$j=1,2,\dots ,m$, are given by the median of the j$j$th column of X$X$ and the initial value of A$A$ is based on the median absolute deviation (see Marazzi (1987)). nag_correg_robustm_corr_huber (g02hk) is based on routines in ROBETH; see Marazzi (1987).

## References

Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Weights for bounded influence regression in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 3 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

## Parameters

### Compulsory Input Parameters

1:     x(ldx,m) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxn$\mathit{ldx}\ge {\mathbf{n}}$.
x(i,j)${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the i$\mathit{i}$th observation for the j$\mathit{j}$th variable, for i = 1,2,,n$\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and j = 1,2,,m$\mathit{j}=1,2,\dots ,{\mathbf{m}}$.
2:     eps – double scalar
ε$\epsilon$, the expected fraction of gross errors expected in the sample.
Constraint: 0.0eps < 1.0$0.0\le {\mathbf{eps}}<1.0$.

### Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The first dimension of the array x.
n$n$, the number of observations.
Constraint: n > 1${\mathbf{n}}>1$.
2:     m – int64int32nag_int scalar
Default: The second dimension of the array x.
m$m$, the number of columns of the matrix X$X$, i.e., number of independent variables.
Constraint: 1mn$1\le {\mathbf{m}}\le {\mathbf{n}}$.
3:     maxit – int64int32nag_int scalar
The maximum number of iterations that will be used during the calculation of the covariance matrix.
Default: 150$150$
Constraint: maxit > 0${\mathbf{maxit}}>0$.
4:     nitmon – int64int32nag_int scalar
Indicates the amount of information on the iteration that is printed.
nitmon > 0${\mathbf{nitmon}}>0$
The value of A$A$, θ$\theta$ and δ$\delta$ (see Section [Accuracy]) will be printed at the first and every nitmon iterations.
nitmon0${\mathbf{nitmon}}\le 0$
No iteration monitoring is printed.
When printing occurs the output is directed to the current advisory message unit (see nag_file_set_unit_advisory (x04ab)).
Default: 0$0$
5:     tol – double scalar
The relative precision for the final estimates of the covariance matrix.
Default: 5e-5$5e-5$
Constraint: tol > 0.0${\mathbf{tol}}>0.0$.

ldx wk

### Output Parameters

1:     cov(m × (m + 1) / 2${\mathbf{m}}×\left({\mathbf{m}}+1\right)/2$) – double array
A robust estimate of the covariance matrix, C$C$. The upper triangular part of the matrix C$C$ is stored packed by columns. Cij${C}_{ij}$ is returned in cov((j × (j1) / 2 + i))${\mathbf{cov}}\left(\left(j×\left(j-1\right)/2+i\right)\right)$, ij$i\le j$.
2:     theta(m) – double array
The robust estimate of the location parameters θj${\theta }_{\mathit{j}}$, for j = 1,2,,m$\mathit{j}=1,2,\dots ,m$.
3:     nit – int64int32nag_int scalar
The number of iterations performed.
4:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Errors or warnings detected by the function:
ifail = 1${\mathbf{ifail}}=1$
 On entry, n ≤ 1${\mathbf{n}}\le 1$, or m < 1${\mathbf{m}}<1$, or n < m${\mathbf{n}}<{\mathbf{m}}$, or ldx < n$\mathit{ldx}<{\mathbf{n}}$, or eps < 0.0${\mathbf{eps}}<0.0$, or eps ≥ 1.0${\mathbf{eps}}\ge 1.0$, or tol ≤ 0.0${\mathbf{tol}}\le 0.0$, or maxit ≤ 0${\mathbf{maxit}}\le 0$.
ifail = 2${\mathbf{ifail}}=2$
 On entry, a variable has a constant value, i.e., all elements in a column of X$X$ are identical.
ifail = 3${\mathbf{ifail}}=3$
The iterative procedure to find C$C$ has failed to converge in maxit iterations.
ifail = 4${\mathbf{ifail}}=4$
The iterative procedure to find C$C$ has become unstable. This may happen if the value of eps is too large for the sample.

## Accuracy

On successful exit the accuracy of the results is related to the value of tol; see Section [Parameters]. At an iteration let
 (i) d1 = $d1=\text{}$ the maximum value of the absolute relative change in A$A$ (ii) d2 = $d2=\text{}$ the maximum absolute change in u(‖zi‖2)$u\left({‖{z}_{i}‖}_{2}\right)$ (iii) d3 = $d3=\text{}$ the maximum absolute relative change in θj${\theta }_{j}$
and let δ = max (d1,d2,d3)$\delta =\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(d1,d2,d3\right)$. Then the iterative procedure is assumed to have converged when δ < tol$\delta <{\mathbf{tol}}$

The existence of A$A$, and hence C$C$, will depend upon the function u$u$ (see Marazzi (1987)); also if X$X$ is not of full rank a value of A$A$ will not be found. If the columns of X$X$ are almost linearly related, then convergence will be slow.

## Example

```function nag_correg_robustm_corr_huber_example
x = [3.4, 6.9, 12.2;
6.4, 2.5, 15.1;
4.9, 5.5, 14.2;
7.3, 1.9, 18.2;
8.8, 3.6, 11.7;
8.4, 1.3, 17.9;
5.3, 3.1, 15;
2.7, 8.1, 7.7;
6.1, 3, 21.9;
5.3, 2.2, 13.9];
epsilon = 0.1;
[covar, theta, nit, ifail] = nag_correg_robustm_corr_huber(x, epsilon)
```
```

covar =

3.4611
-3.6806
5.3477
4.6818
-6.6445
14.4389

theta =

5.8178
3.6813
15.0369

nit =

23

ifail =

0

```
```function g02hk_example
x = [3.4, 6.9, 12.2;
6.4, 2.5, 15.1;
4.9, 5.5, 14.2;
7.3, 1.9, 18.2;
8.8, 3.6, 11.7;
8.4, 1.3, 17.9;
5.3, 3.1, 15;
2.7, 8.1, 7.7;
6.1, 3, 21.9;
5.3, 2.2, 13.9];
epsilon = 0.1;
[covar, theta, nit, ifail] = g02hk(x, epsilon)
```
```

covar =

3.4611
-3.6806
5.3477
4.6818
-6.6445
14.4389

theta =

5.8178
3.6813
15.0369

nit =

23

ifail =

0

```