Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_nonpar_test_chisq (g08cg)

## Purpose

nag_nonpar_test_chisq (g08cg) computes the test statistic for the χ2${\chi }^{2}$ goodness-of-fit test for data with a chosen number of class intervals.

## Syntax

[chisq, p, ndf, eval, chisqi, ifail] = g08cg(ifreq, cb, dist, par, npest, prob, 'nclass', nclass)
[chisq, p, ndf, eval, chisqi, ifail] = nag_nonpar_test_chisq(ifreq, cb, dist, par, npest, prob, 'nclass', nclass)

## Description

The χ2${\chi }^{2}$ goodness-of-fit test performed by nag_nonpar_test_chisq (g08cg) is used to test the null hypothesis that a random sample arises from a specified distribution against the alternative hypothesis that the sample does not arise from the specified distribution.
Given a sample of size n$n$, denoted by x1,x2,,xn${x}_{1},{x}_{2},\dots ,{x}_{n}$, drawn from a random variable X$X$, and that the data has been grouped into k$k$ classes,
 x ≤ c1, ci − 1 < x ≤ ci, i = 2,3, … ,k − 1, x > ck − 1,
$x≤c1, ci-1ck-1,$
then the χ2${\chi }^{2}$ goodness-of-fit test statistic is defined by
 k X2 = ∑ ((Oi − Ei)2)/(Ei), i = 1
$X2=∑i=1k (Oi-Ei) 2Ei,$
where Oi${O}_{i}$ is the observed frequency of the i$i$th class, and Ei${E}_{i}$ is the expected frequency of the i$i$th class.
The expected frequencies are computed as
 Ei = pi × n, $Ei=pi×n,$
where pi${p}_{i}$ is the probability that X$X$ lies in the i$i$th class, that is
 p1 = P(X ≤ c1), pi = P(ci − 1 < X ≤ ci), i = 2,3, … ,k − 1, pk = P(X > ck − 1).
$p1=P(X≤c1), pi=P(ci-1ck-1).$
These probabilities are either taken from a common probability distribution or are supplied by you. The available probability distributions within this function are:
• Normal distribution with mean μ$\mu$, variance σ2${\sigma }^{2}$;
• uniform distribution on the interval [a,b]$\left[a,b\right]$;
• exponential distribution with probability density function (pdf) = λeλx$\left(\mathrm{pdf}\right)=\lambda {e}^{-\lambda x}$;
• χ2${\chi }^{2}$-distribution with f$f$ degrees of freedom; and
• gamma distribution with pdf = (xα1ex / β)/(Γ(α)βα) $\mathrm{pdf}=\frac{{x}^{\alpha -1}{e}^{-x/\beta }}{\Gamma \left(\alpha \right){\beta }^{\alpha }}$.
You must supply the frequencies and classes. Given a set of data and classes the frequencies may be calculated using nag_stat_frequency_table (g01ae).
nag_nonpar_test_chisq (g08cg) returns the χ2${\chi }^{2}$ test statistic, X2${X}^{2}$, together with its degrees of freedom and the upper tail probability from the χ2${\chi }^{2}$-distribution associated with the test statistic. Note that the use of the χ2${\chi }^{2}$-distribution as an approximation to the distribution of the test statistic improves as the expected values in each class increase.

## References

Conover W J (1980) Practical Nonparametric Statistics Wiley
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill

## Parameters

### Compulsory Input Parameters

1:     ifreq(nclass) – int64int32nag_int array
nclass, the dimension of the array, must satisfy the constraint nclass2${\mathbf{nclass}}\ge 2$.
ifreq(i)${\mathbf{ifreq}}\left(\mathit{i}\right)$ must specify the frequency of the i$\mathit{i}$th class, Oi${O}_{\mathit{i}}$, for i = 1,2,,k$\mathit{i}=1,2,\dots ,k$.
Constraint: ifreq(i)0${\mathbf{ifreq}}\left(\mathit{i}\right)\ge 0$, for i = 1,2,,k$\mathit{i}=1,2,\dots ,k$.
2:     cb(nclass1${\mathbf{nclass}}-1$) – double array
cb(i)${\mathbf{cb}}\left(\mathit{i}\right)$ must specify the upper boundary value for the i$\mathit{i}$th class, for i = 1,2,,k1$\mathit{i}=1,2,\dots ,k-1$.
Constraint: cb(1) < cb(2) < < cb(nclass1)${\mathbf{cb}}\left(1\right)<{\mathbf{cb}}\left(2\right)<\cdots <{\mathbf{cb}}\left({\mathbf{nclass}}-1\right)$. For the exponential, gamma and χ2${\chi }^{2}$-distributions cb(1)0.0${\mathbf{cb}}\left(1\right)\ge 0.0$.
3:     dist – string (length ≥ 1)
Indicates for which distribution the test is to be carried out.
dist = 'N'${\mathbf{dist}}=\text{'N'}$
The Normal distribution is used.
dist = 'U'${\mathbf{dist}}=\text{'U'}$
The uniform distribution is used.
dist = 'E'${\mathbf{dist}}=\text{'E'}$
The exponential distribution is used.
dist = 'C'${\mathbf{dist}}=\text{'C'}$
The χ2${\chi }^{2}$-distribution is used.
dist = 'G'${\mathbf{dist}}=\text{'G'}$
The gamma distribution is used.
dist = 'A'${\mathbf{dist}}=\text{'A'}$
You must supply the class probabilities in the array prob.
Constraint: dist = 'N'${\mathbf{dist}}=\text{'N'}$, 'U'$\text{'U'}$, 'E'$\text{'E'}$, 'C'$\text{'C'}$, 'G'$\text{'G'}$ or 'A'$\text{'A'}$.
4:     par(2$2$) – double array
Must contain the parameters of the distribution which is being tested. If you supply the probabilities (i.e., dist = 'A'${\mathbf{dist}}=\text{'A'}$) the array par is not referenced.
If a Normal distribution is used then par(1)${\mathbf{par}}\left(1\right)$ and par(2)${\mathbf{par}}\left(2\right)$ must contain the mean, μ$\mu$, and the variance, σ2${\sigma }^{2}$, respectively.
If a uniform distribution is used then par(1)${\mathbf{par}}\left(1\right)$ and par(2)${\mathbf{par}}\left(2\right)$ must contain the boundaries a$a$ and b$b$ respectively.
If an exponential distribution is used then par(1)${\mathbf{par}}\left(1\right)$ must contain the parameter λ$\lambda$. par(2)${\mathbf{par}}\left(2\right)$ is not used.
If a χ2${\chi }^{2}$-distribution is used then par(1)${\mathbf{par}}\left(1\right)$ must contain the number of degrees of freedom. par(2)${\mathbf{par}}\left(2\right)$ is not used.
If a gamma distribution is used par(1)${\mathbf{par}}\left(1\right)$ and par(2)${\mathbf{par}}\left(2\right)$ must contain the parameters α$\alpha$ and β$\beta$ respectively.
Constraints:
• if dist = 'N'${\mathbf{dist}}=\text{'N'}$, par(2) > 0.0${\mathbf{par}}\left(2\right)>0.0$;
• if dist = 'U'${\mathbf{dist}}=\text{'U'}$, par(1) < par(2)${\mathbf{par}}\left(1\right)<{\mathbf{par}}\left(2\right)$ and par(1)cb(1)${\mathbf{par}}\left(1\right)\le {\mathbf{cb}}\left(1\right)$ and par(2)cb(nclass1)${\mathbf{par}}\left(2\right)\ge {\mathbf{cb}}\left({\mathbf{nclass}}-1\right)$;
• if dist = 'E'${\mathbf{dist}}=\text{'E'}$, par(1) > 0.0${\mathbf{par}}\left(1\right)>0.0$;
• if dist = 'C'${\mathbf{dist}}=\text{'C'}$, par(1) > 0.0${\mathbf{par}}\left(1\right)>0.0$;
• if dist = 'G'${\mathbf{dist}}=\text{'G'}$, par(1) > 0.0${\mathbf{par}}\left(1\right)>0.0$ and par(2) > 0.0${\mathbf{par}}\left(2\right)>0.0$.
5:     npest – int64int32nag_int scalar
The number of estimated parameters of the distribution.
Constraint: 0npest < nclass1$0\le {\mathbf{npest}}<{\mathbf{nclass}}-1$.
6:     prob(nclass) – double array
nclass, the dimension of the array, must satisfy the constraint nclass2${\mathbf{nclass}}\ge 2$.
If you are supplying the probability distribution (i.e., dist = 'A'${\mathbf{dist}}=\text{'A'}$) then prob(i)${\mathbf{prob}}\left(i\right)$ must contain the probability that X$X$ lies in the i$i$th class.
If dist'A'${\mathbf{dist}}\ne \text{'A'}$, prob is not referenced.
Constraint: if dist = 'A'${\mathbf{dist}}=\text{'A'}$, i = 1kprob(i) = 1.0$\sum _{i=1}^{k}{\mathbf{prob}}\left(i\right)=1.0$, prob(i) > 0.0${\mathbf{prob}}\left(\mathit{i}\right)>0.0$, for i = 1,2,,k$\mathit{i}=1,2,\dots ,k$.

### Optional Input Parameters

1:     nclass – int64int32nag_int scalar
Default: The dimension of the arrays ifreq, prob. (An error is raised if these dimensions are not equal.)
k$k$, the number of classes into which the data is divided.
Constraint: nclass2${\mathbf{nclass}}\ge 2$.

None.

### Output Parameters

1:     chisq – double scalar
The test statistic, X2${X}^{2}$, for the χ2${\chi }^{2}$ goodness-of-fit test.
2:     p – double scalar
The upper tail probability from the χ2${\chi }^{2}$-distribution associated with the test statistic, X2${X}^{2}$, and the number of degrees of freedom.
3:     ndf – int64int32nag_int scalar
Contains (nclass1npest)$\left({\mathbf{nclass}}-1-{\mathbf{npest}}\right)$, the degrees of freedom associated with the test.
4:     eval(nclass) – double array
eval(i)${\mathbf{eval}}\left(\mathit{i}\right)$ contains the expected frequency for the i$\mathit{i}$th class, Ei${E}_{\mathit{i}}$, for i = 1,2,,k$\mathit{i}=1,2,\dots ,k$.
5:     chisqi(nclass) – double array
chisqi(i)${\mathbf{chisqi}}\left(\mathit{i}\right)$ contains the contribution from the i$\mathit{i}$th class to the test statistic, that is, (OiEi)2 / Ei${\left({O}_{\mathit{i}}-{E}_{\mathit{i}}\right)}^{2}/{E}_{\mathit{i}}$, for i = 1,2,,k$\mathit{i}=1,2,\dots ,k$.
6:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Note: nag_nonpar_test_chisq (g08cg) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

ifail = 1${\mathbf{ifail}}=1$
 On entry, nclass < 2${\mathbf{nclass}}<2$.
ifail = 2${\mathbf{ifail}}=2$
 On entry, dist is invalid.
ifail = 3${\mathbf{ifail}}=3$
 On entry, npest < 0${\mathbf{npest}}<0$, or npest ≥ nclass − 1${\mathbf{npest}}\ge {\mathbf{nclass}}-1$.
ifail = 4${\mathbf{ifail}}=4$
 On entry, ifreq(i) < 0.0${\mathbf{ifreq}}\left(\mathit{i}\right)<0.0$ for some i$\mathit{i}$, for i = 1,2, … ,k$\mathit{i}=1,2,\dots ,k$.
ifail = 5${\mathbf{ifail}}=5$
On entry, the elements of cb are not in ascending order. That is, cb(i)cb(i1)${\mathbf{cb}}\left(\mathit{i}\right)\le {\mathbf{cb}}\left(\mathit{i}-1\right)$ for some i$\mathit{i}$, for i = 2,3,,k1$\mathit{i}=2,3,\dots ,k-1$.
ifail = 6${\mathbf{ifail}}=6$
On entry, dist = 'E'${\mathbf{dist}}=\text{'E'}$, 'C'$\text{'C'}$ or 'G'$\text{'G'}$ and cb(1) < 0.0${\mathbf{cb}}\left(1\right)<0.0$. No negative class boundary values are valid for the exponential, gamma or χ2${\chi }^{2}$-distributions.
ifail = 7${\mathbf{ifail}}=7$
 On entry, the values provided in par are invalid.
ifail = 8${\mathbf{ifail}}=8$
 On entry, with dist = 'A'${\mathbf{dist}}=\text{'A'}$, prob(i) ≤ 0.0${\mathbf{prob}}\left(i\right)\le 0.0$ for some i$i$, for i = 1,2, … ,k$i=1,2,\dots ,k$, or ∑ i = 1kprob(i) ≠ 1.0$\sum _{i=1}^{k}{\mathbf{prob}}\left(i\right)\ne 1.0$.
ifail = 9${\mathbf{ifail}}=9$
An expected frequency is equal to zero when the observed frequency was not.
W ifail = 10${\mathbf{ifail}}=10$
This is a warning that expected values for certain classes are less than 1.0$1.0$. This implies that we cannot be confident that the χ2${\chi }^{2}$-distribution is a good approximation to the distribution of the test statistic.
W ifail = 11${\mathbf{ifail}}=11$
The solution obtained when calculating the probability for a certain class for the gamma or χ2${\chi }^{2}$-distribution did not converge in 600$600$ iterations. The solution may be an adequate approximation.

## Accuracy

The computations are believed to be stable.

The time taken by nag_nonpar_test_chisq (g08cg) is dependent both on the distribution chosen and on the number of classes, k$k$.

## Example

```function nag_nonpar_test_chisq_example
ifreq = [int64(26);16;22;19;17];
cb = [0.2;
0.4;
0.6;
0.8];
dist = 'U';
par = [0;
1];
npest = int64(0);
prob = [0;
0;
0;
4.878438904751203e+199;
5.495816452771857e+222];
[chisq, p, ndf, eval, chisqi, ifail] = nag_nonpar_test_chisq(ifreq, cb, dist, par, npest, prob)
```
```

chisq =

3.3000

p =

0.5089

ndf =

4

eval =

20.0000
20.0000
20.0000
20.0000
20.0000

chisqi =

1.8000
0.8000
0.2000
0.0500
0.4500

ifail =

0

```
```function g08cg_example
ifreq = [int64(26);16;22;19;17];
cb = [0.2;
0.4;
0.6;
0.8];
dist = 'U';
par = [0;
1];
npest = int64(0);
prob = [0;
0;
0;
4.878438904751203e+199;
5.495816452771857e+222];
[chisq, p, ndf, eval, chisqi, ifail] = g08cg(ifreq, cb, dist, par, npest, prob)
```
```

chisq =

3.3000

p =

0.5089

ndf =

4

eval =

20.0000
20.0000
20.0000
20.0000
20.0000

chisqi =

1.8000
0.8000
0.2000
0.0500
0.4500

ifail =

0

```