G02 Chapter Contents
G02 Chapter Introduction
NAG Library Manual

# NAG Library Routine DocumentG02BXF

Note:  before using this routine, please read the Users' Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent details.

## 1  Purpose

G02BXF calculates the sample means, the standard deviations, the variance-covariance matrix, and the matrix of Pearson product-moment correlation coefficients for a set of data. Weights may be used.

## 2  Specification

 SUBROUTINE G02BXF ( WEIGHT, N, M, X, LDX, WT, XBAR, STD, V, LDV, R, IFAIL)
 INTEGER N, M, LDX, LDV, IFAIL REAL (KIND=nag_wp) X(LDX,M), WT(*), XBAR(M), STD(M), V(LDV,M), R(LDV,M) CHARACTER(1) WEIGHT

## 3  Description

G02BXF uses a one-pass algorithm to compute the (optionally weighted) means and sums of squares and cross-products of deviations about the means. The algorithm uses a single pass updating algorithm as implemented by G02BUF. The variance-covariance matrix, the standard deviations and the Pearson product-moment correlation matrix are then computed from these basic results, the latter by means of G02BWF.

## 4  References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

## 5  Parameters

1:     WEIGHT – CHARACTER(1)Input
On entry: indicates whether weights are to be used.
${\mathbf{WEIGHT}}=\text{'U'}$
Weights are not used and unit weights are assumed.
${\mathbf{WEIGHT}}=\text{'W'}$ or $\text{'V'}$
Weights are used and must be supplied in WT. The only difference between ${\mathbf{WEIGHT}}=\text{'W'}$ or ${\mathbf{WEIGHT}}=\text{'V'}$ is in computing the variance. If ${\mathbf{WEIGHT}}=\text{'W'}$ the divisor for the variance is the sum of the weights minus one and if ${\mathbf{WEIGHT}}=\text{'V'}$ the divisor is the number of observations with nonzero weights minus one. The former is useful if the weights represent the frequency of the observed values.
Constraint: ${\mathbf{WEIGHT}}=\text{'U'}$, $\text{'V'}$ or $\text{'W'}$.
2:     N – INTEGERInput
On entry: the number of data observations in the sample.
Constraint: ${\mathbf{N}}>1$.
3:     M – INTEGERInput
On entry: the number of variables.
Constraint: ${\mathbf{M}}\ge 1$.
4:     X(LDX,M) – REAL (KIND=nag_wp) arrayInput
On entry: ${\mathbf{X}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th variable, for $\mathit{i}=1,2,\dots ,{\mathbf{N}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{M}}$.
5:     LDX – INTEGERInput
On entry: the first dimension of the array X as declared in the (sub)program from which G02BXF is called.
Constraint: ${\mathbf{LDX}}\ge {\mathbf{N}}$.
6:     WT($*$) – REAL (KIND=nag_wp) arrayInput
Note: the dimension of the array WT must be at least ${\mathbf{N}}$ if ${\mathbf{WEIGHT}}=\text{'W'}$ or $\text{'V'}$, and at least $1$ otherwise.
On entry: the optional weights.
If ${\mathbf{WEIGHT}}=\text{'W'}$ or $\text{'V'}$, ${\mathbf{WT}}\left(i\right)$ must contain the weight for the $i$th observation. When ${\mathbf{WEIGHT}}=\text{'W'}$ the effective number of observations is given by the sum of these weights as opposed to the number of nonzero weights when ${\mathbf{WEIGHT}}=\text{'V'}$.
If ${\mathbf{WEIGHT}}=\text{'U'}$, WT is not referenced.
Constraint: if ${\mathbf{WEIGHT}}=\text{'W'}$ or $\text{'V'}$, $\sum _{\mathit{i}=1}^{{\mathbf{N}}}{\mathbf{WT}}\left(\mathit{i}\right)>1.0$, ${\mathbf{WT}}\left(\mathit{i}\right)\ge 0.0$, for $\mathit{i}=1,2,\dots ,{\mathbf{N}}$.
7:     XBAR(M) – REAL (KIND=nag_wp) arrayOutput
On exit: the sample means. ${\mathbf{XBAR}}\left(j\right)$ contains the mean of the $j$th variable.
8:     STD(M) – REAL (KIND=nag_wp) arrayOutput
On exit: the standard deviations. ${\mathbf{STD}}\left(j\right)$ contains the standard deviation for the $j$th variable.
9:     V(LDV,M) – REAL (KIND=nag_wp) arrayOutput
On exit: the variance-covariance matrix. ${\mathbf{V}}\left(\mathit{j},\mathit{k}\right)$ contains the covariance between variables $\mathit{j}$ and $\mathit{k}$, for $\mathit{j}=1,2,\dots ,{\mathbf{M}}$ and $\mathit{k}=1,2,\dots ,{\mathbf{M}}$.
10:   LDV – INTEGERInput
On entry: the first dimension of the arrays R and V as declared in the (sub)program from which G02BXF is called.
Constraint: ${\mathbf{LDV}}\ge {\mathbf{M}}$.
11:   R(LDV,M) – REAL (KIND=nag_wp) arrayOutput
On exit: the matrix of Pearson product-moment correlation coefficients. ${\mathbf{R}}\left(j,k\right)$ contains the correlation coefficient between variables $j$ and $k$.
12:   IFAIL – INTEGERInput/Output
On entry: IFAIL must be set to $0$, $-1\text{​ or ​}1$. If you are unfamiliar with this parameter you should refer to Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value $-1\text{​ or ​}1$ is recommended. If the output of error messages is undesirable, then the value $1$ is recommended. Otherwise, because for this routine the values of the output parameters may be useful even if ${\mathbf{IFAIL}}\ne {\mathbf{0}}$ on exit, the recommended value is $-1$. When the value $-\mathbf{1}\text{​ or ​}1$ is used it is essential to test the value of IFAIL on exit.
On exit: ${\mathbf{IFAIL}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6  Error Indicators and Warnings

If on entry ${\mathbf{IFAIL}}={\mathbf{0}}$ or $-{\mathbf{1}}$, explanatory error messages are output on the current error message unit (as defined by X04AAF).
Note: G02BXF may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the routine:
${\mathbf{IFAIL}}=1$
 On entry, ${\mathbf{M}}<1$, or ${\mathbf{N}}\le 1$, or ${\mathbf{LDX}}<{\mathbf{N}}$, or ${\mathbf{LDV}}<{\mathbf{M}}$.
${\mathbf{IFAIL}}=2$
 On entry, ${\mathbf{WEIGHT}}\ne \text{'U'}$, $\text{'V'}$ or $\text{'W'}$.
${\mathbf{IFAIL}}=3$
 On entry, ${\mathbf{WEIGHT}}=\text{'W'}$ or $\text{'V'}$ and a value of ${\mathbf{WT}}<0.0$.
${\mathbf{IFAIL}}=4$
${\mathbf{WEIGHT}}=\text{'W'}$ and the sum of weights is not greater than $1.0$, or ${\mathbf{WEIGHT}}=\text{'V'}$ and fewer than $2$ observations have nonzero weights.
${\mathbf{IFAIL}}=5$
A variable has a zero variance. In this case V and STD are returned as calculated but R will contain zero for any correlation involving a variable with zero variance.

## 7  Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

None.

## 9  Example

The data are some of the results from 1988 Olympic Decathlon. They are the times (in seconds) for the 100m and 400m races and the distances (in metres) for the long jump, high jump and shot. Twenty observations are input and the correlation matrix is computed and printed.

### 9.1  Program Text

Program Text (g02bxfe.f90)

### 9.2  Program Data

Program Data (g02bxfe.d)

### 9.3  Program Results

Program Results (g02bxfe.r)