For a set of
$n$ observations classified by two variables, with
$r$ and
$c$ levels respectively, a twoway table of frequencies with
$r$ rows and
$c$ columns can be computed.
To measure the association between the two classification variables two statistics that can be used are, the Pearson
${\chi}^{2}$ statistic,
$\sum _{i=1}^{r}}{\displaystyle \sum _{j=1}^{c}}\frac{{\left({n}_{ij}{f}_{ij}\right)}^{2}}{{f}_{ij}$, and the likelihood ratio test statistic,
$2{\displaystyle \sum _{i=1}^{r}}{\displaystyle \sum _{j=1}^{c}}{n}_{ij}\times \mathrm{log}\left({n}_{ij}/{f}_{ij}\right)$, where
${f}_{ij}$ are the fitted values from the model that assumes the effects due to the classification variables are additive, i.e., there is no association. These values are the expected cell frequencies and are given by
Under the hypothesis of no association between the two classification variables, both these statistics have, approximately, a
${\chi}^{2}$distribution with
$\left(c1\right)\left(r1\right)$ degrees of freedom. This distribution is arrived at under the assumption that the expected cell frequencies,
${f}_{ij}$, are not too small. For a discussion of this point see
Everitt (1977). He concludes by saying, ‘... in the majority of cases the chisquare criterion may be used for tables with expectations in excess of
$0.5$ in the smallest cell’.
In the case of the
$2\times 2$ table, i.e.,
$c=2$ and
$r=2$, the
${\chi}^{2}$ approximation can be improved by using Yates' continuity correction factor. This decreases the absolute value of
$\left({n}_{ij}{f}_{ij}\right)$ by
$\frac{1}{2}$. For
$2\times 2$ tables with a small value of
$n$ the exact probabilities from Fisher's test are computed. These are based on the hypergeometric distribution and are computed using
G01BLF. A two tail probability is computed as
$\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(1,2{p}_{u},2{p}_{l}\right)$, where
${p}_{u}$ and
${p}_{l}$ are the upper and lower onetail probabilities from the hypergeometric distribution.
 1: NROW – INTEGERInput
On entry: $r$, the number of rows in the contingency table.
Constraint:
${\mathbf{NROW}}\ge 2$.
 2: NCOL – INTEGERInput
On entry: $c$, the number of columns in the contingency table.
Constraint:
${\mathbf{NCOL}}\ge 2$.
 3: NOBS(LDNOBS,NCOL) – INTEGER arrayInput
On entry: the contingency table
${\mathbf{NOBS}}\left(\mathit{i},\mathit{j}\right)$ must contain ${n}_{\mathit{i}\mathit{j}}$, for $\mathit{i}=1,2,\dots ,r$ and $\mathit{j}=1,2,\dots ,c$.
Constraint:
${\mathbf{NOBS}}\left(\mathit{i},\mathit{j}\right)\ge 0$, for $\mathit{i}=1,2,\dots ,r$ and $\mathit{j}=1,2,\dots ,c$.
 4: LDNOBS – INTEGERInput
On entry: the first dimension of the arrays
NOBS,
EXPT and
CHIST as declared in the (sub)program from which G11AAF is called.
Constraint:
${\mathbf{LDNOBS}}\ge {\mathbf{NROW}}$.
 5: EXPT(LDNOBS,NCOL) – REAL (KIND=nag_wp) arrayOutput
On exit: the table of expected values.
${\mathbf{EXPT}}\left(\mathit{i},\mathit{j}\right)$ contains ${f}_{\mathit{i}\mathit{j}}$, for $\mathit{i}=1,2,\dots ,r$ and $\mathit{j}=1,2,\dots ,c$.
 6: CHIST(LDNOBS,NCOL) – REAL (KIND=nag_wp) arrayOutput
On exit: the table of ${\chi}^{2}$ contributions.
${\mathbf{CHIST}}\left(\mathit{i},\mathit{j}\right)$ contains $\frac{{\left({n}_{\mathit{i}\mathit{j}}{f}_{\mathit{i}\mathit{j}}\right)}^{2}}{{f}_{\mathit{i}\mathit{j}}}$, for $\mathit{i}=1,2,\dots ,r$ and $\mathit{j}=1,2,\dots ,c$.
 7: PROB – REAL (KIND=nag_wp)Output
On exit: if
$c=2$,
$r=2$ and
$n\le 40$ then
PROB contains the two tail significance level for Fisher's exact test, otherwise
PROB contains the significance level from the Pearson
${\chi}^{2}$ statistic.
 8: CHI – REAL (KIND=nag_wp)Output
On exit: the Pearson ${\chi}^{2}$ statistic.
 9: G – REAL (KIND=nag_wp)Output
On exit: the likelihood ratio test statistic.
 10: DF – REAL (KIND=nag_wp)Output
On exit: the degrees of freedom for the statistics.
 11: IFAIL – INTEGERInput/Output

On entry:
IFAIL must be set to
$0$,
$1\text{ or}1$. If you are unfamiliar with this parameter you should refer to
Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
$1\text{ or}1$ is recommended. If the output of error messages is undesirable, then the value
$1$ is recommended. Otherwise, because for this routine the values of the output parameters may be useful even if
${\mathbf{IFAIL}}\ne {\mathbf{0}}$ on exit, the recommended value is
$1$.
When the value $\mathbf{1}\text{ or}1$ is used it is essential to test the value of IFAIL on exit.
On exit:
${\mathbf{IFAIL}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
If on entry
${\mathbf{IFAIL}}={\mathbf{0}}$ or
${{\mathbf{1}}}$, explanatory error messages are output on the current error message unit (as defined by
X04AAF).
For the accuracy of the probabilities for Fisher's exact test see
G01BLF.
The routine
G01AFF allows for the automatic amalgamation of rows and columns. In most circumstances this is not recommended; see
Everitt (1977).
Multidimensional contingency tables can be analysed using loglinear models fitted by
G02GBF.
The data below, taken from
Everitt (1977), is from
$141$ patients with brain tumours. The row classification variable is the site of the tumour: frontal lobes, temporal lobes and other cerebral areas. The column classification variable is the type of tumour: benign, malignant and other cerebral tumours.
The data is read in and the statistics computed and printed.