Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_ssqmat (g02bu)

Purpose

nag_correg_ssqmat (g02bu) calculates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations from the mean, in a single pass for a set of data. The data may be weighted.

Syntax

[sw, wmean, c, ifail] = g02bu(x, 'mean', mean, 'n', n, 'm', m, 'wt', wt)
[sw, wmean, c, ifail] = nag_correg_ssqmat(x, 'mean', mean, 'n', n, 'm', m, 'wt', wt)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 22: n has been made optional
Mark 24: mean optional
.

Description

nag_correg_ssqmat (g02bu) is an adaptation of West's WV2 algorithm; see West (1979). This function calculates the (optionally weighted) sample means and (optionally weighted) sums of squares and cross-products or sums of squares and cross-products of deviations from the (weighted) mean for a sample of n$n$ observations on m$m$ variables Xj${X}_{j}$, for j = 1,2,,m$\mathit{j}=1,2,\dots ,m$. The algorithm makes a single pass through the data.
For the first i1$i-1$ observations let the mean of the j$j$th variable be xj(i1)${\stackrel{-}{x}}_{j}\left(i-1\right)$, the cross-product about the mean for the j$j$th and k$k$th variables be cjk(i1)${c}_{jk}\left(i-1\right)$ and the sum of weights be Wi1${W}_{i-1}$. These are updated by the i$i$th observation, xij${x}_{ij}$, for j = 1,2,,m$\mathit{j}=1,2,\dots ,m$, with weight wi${w}_{i}$ as follows:
 Wi = Wi − 1 + wi xj (i) = xj (i − 1) + (wi)/(Wi) (xj − xj(i − 1)) ,   j = 1,2, … ,m
$Wi = Wi-1 + wi x-j (i) = x-j (i-1) + wiWi ( xj - x-j (i-1) ) , j=1,2,…,m$
and
 cjk (i) = cjk (i − 1) + (wi)/(Wi) (xj − xj(i − 1)) (xk − xk(i − 1)) Wi − 1 ,   j = 1,2, … ,m ​ and ​ k = j,j + 1, … ,m . $cjk (i) = cjk (i- 1) + wi Wi ( xj - x-j (i- 1) ) ( xk - x-k (i-1) ) Wi-1 , j=1,2,…,m ​ and ​ k=j,j+ 1,…,m .$
The algorithm is initialized by taking xj(1) = x1j${\stackrel{-}{x}}_{j}\left(1\right)={x}_{1j}$, the first observation, and cij(1) = 0.0${c}_{ij}\left(1\right)=0.0$.
For the unweighted case wi = 1${w}_{i}=1$ and Wi = i${W}_{i}=i$ for all i$i$.
Note that only the upper triangle of the matrix is calculated and returned packed by column.

References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

Parameters

Compulsory Input Parameters

1:     x(ldx,m) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxn$\mathit{ldx}\ge {\mathbf{n}}$.
x(i,j)${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the i$\mathit{i}$th observation on the j$\mathit{j}$th variable, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$ and j = 1,2,,m$\mathit{j}=1,2,\dots ,m$.

Optional Input Parameters

1:     mean – string (length ≥ 1)
Indicates whether nag_correg_ssqmat (g02bu) is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.
mean = 'M'${\mathbf{mean}}=\text{'M'}$
The sums of squares and cross-products of deviations about the mean are calculated.
mean = 'Z'${\mathbf{mean}}=\text{'Z'}$
The sums of squares and cross-products are calculated.
Default: 'M'$\text{'M'}$
Constraint: mean = 'M'${\mathbf{mean}}=\text{'M'}$ or 'Z'$\text{'Z'}$.
2:     n – int64int32nag_int scalar
Default: The first dimension of the array x.
n$n$, the number of observations in the dataset.
Constraint: n1${\mathbf{n}}\ge 1$.
3:     m – int64int32nag_int scalar
Default: The second dimension of the array x.
m$m$, the number of variables.
Constraint: m1${\mathbf{m}}\ge 1$.
4:     wt( : $:$) – double array
Note: the dimension of the array wt must be at least n${\mathbf{n}}$ if weight = 'W'$\mathit{weight}=\text{'W'}$, and at least 1$1$ otherwise.
The optional weights of each observation.
If weight = 'U'$\mathit{weight}=\text{'U'}$, wt is not referenced.
If weight = 'W'$\mathit{weight}=\text{'W'}$, wt(i)${\mathbf{wt}}\left(i\right)$ must contain the weight for the i$i$th observation.
Constraint: if weight = 'W'$\mathit{weight}=\text{'W'}$, wt(i)0.0${\mathbf{wt}}\left(\mathit{i}\right)\ge 0.0$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.

weight ldx

Output Parameters

1:     sw – double scalar
The sum of weights.
If weight = 'U'$\mathit{weight}=\text{'U'}$, sw contains the number of observations, n$n$.
2:     wmean(m) – double array
The sample means. wmean(j)${\mathbf{wmean}}\left(j\right)$ contains the mean for the j$j$th variable.
3:     c((m × m + m) / 2$\left({\mathbf{m}}×{\mathbf{m}}+{\mathbf{m}}\right)/2$) – double array
The cross-products.
If mean = 'M'${\mathbf{mean}}=\text{'M'}$, c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products of deviations about the mean.
If mean = 'Z'${\mathbf{mean}}=\text{'Z'}$, c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products.
These are stored packed by columns, i.e., the cross-product between the j$j$th and k$k$th variable, kj$k\ge j$, is stored in c(k × (k1) / 2 + j)${\mathbf{c}}\left(k×\left(k-1\right)/2+j\right)$.
4:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
ifail = 1${\mathbf{ifail}}=1$
 On entry, m < 1${\mathbf{m}}<1$, or n < 1${\mathbf{n}}<1$, or ldx < n$\mathit{ldx}<{\mathbf{n}}$.
ifail = 2${\mathbf{ifail}}=2$
 On entry, mean ≠ 'M'${\mathbf{mean}}\ne \text{'M'}$ or 'Z'$\text{'Z'}$.
ifail = 3${\mathbf{ifail}}=3$
 On entry, weight ≠ 'W'$\mathit{weight}\ne \text{'W'}$ or 'U'$\text{'U'}$.
ifail = 4${\mathbf{ifail}}=4$
 On entry, weight = 'W'$\mathit{weight}=\text{'W'}$, and a value of wt < 0.0${\mathbf{wt}}<0.0$.

Accuracy

For a detailed discussion of the accuracy of this algorithm see Chan et al. (1982) or West (1979).

nag_correg_ssqmat_to_corrmat (g02bw) may be used to calculate the correlation coefficients from the cross-products of deviations about the mean. The cross-products of deviations about the mean may be scaled using to give a variance-covariance matrix.
The means and cross-products produced by nag_correg_ssqmat (g02bu) may be updated by adding or removing observations using nag_correg_ssqmat_update (g02bt).

Example

```function nag_correg_ssqmat_example
wt = [0.13, 1.307, 0.37];
x = [9.1231, 3.7011, 4.523;
0.931, 0.09, 0.887;
0.0009, 0.0099, 0.0999];
[sw, wmean, c, ifail] = nag_correg_ssqmat(x, 'wt', wt)
```
```

sw =

1.8070

wmean =

1.3299
0.3334
0.9874

c =

8.7569
3.6978
1.5905
4.0707
1.6861
1.9297

ifail =

0

```
```function g02bu_example
wt = [0.13, 1.307, 0.37];
x = [9.1231, 3.7011, 4.523;
0.931, 0.09, 0.887;
0.0009, 0.0099, 0.0999];
[sw, wmean, c, ifail] = g02bu(x, 'wt', wt)
```
```

sw =

1.8070

wmean =

1.3299
0.3334
0.9874

c =

8.7569
3.6978
1.5905
4.0707
1.6861
1.9297

ifail =

0

```