Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_correg_glm_predict (g02gp)

## Purpose

nag_correg_glm_predict (g02gp) allows prediction from a generalized linear model fit via nag_correg_glm_normal (g02ga), nag_correg_glm_binomial (g02gb), nag_correg_glm_poisson (g02gc) or nag_correg_glm_gamma (g02gd).

## Syntax

[eta, seeta, pred, sepred, ifail] = g02gp(errfn, link, mean, x, isx, b, cov, vfobs, 'n', n, 'm', m, 'ip', ip, 't', t, 'off', off, 'wt', wt, 's', s, 'a', a)
[eta, seeta, pred, sepred, ifail] = nag_correg_glm_predict(errfn, link, mean, x, isx, b, cov, vfobs, 'n', n, 'm', m, 'ip', ip, 't', t, 'off', off, 'wt', wt, 's', s, 'a', a)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 23: wt, off, s, a now optional, weight & offset dropped from interface, t now optional, default to vector of 1s
.

## Description

A generalized linear model consists of the following elements:
 (i) A suitable distribution for the dependent variable y$y$. (ii) A linear model, with linear predictor η = Xβ$\eta =X\beta$, where X$X$ is a matrix of independent variables and β$\beta$ a column vector of p$p$ parameters. (iii) A link function g(.)$g\left(.\right)$ between the expected value of y$y$ and the linear predictor, that is E(y) = μ = g(η)$E\left(y\right)=\mu =g\left(\eta \right)$.
In order to predict from a generalized linear model, that is estimate a value for the dependent variable, y$y$, given a set of independent variables X$X$, the matrix X$X$ must be supplied, along with values for the parameters β$\beta$ and their associated variance-covariance matrix, C$C$. Suitable values for β$\beta$ and C$C$ are usually estimated by first fitting the prediction model to a training dataset with known responses, using for example nag_correg_glm_normal (g02ga), nag_correg_glm_binomial (g02gb), nag_correg_glm_poisson (g02gc) or nag_correg_glm_gamma (g02gd). The predicted variable, and its standard error can then be obtained from:
 ŷ = g − 1(η) ,   se(ŷ) = sqrt( ((δg − 1(x))/(δx)) η ) se(η) + Ifobs Var(y) $y^ = g-1(η) , se( y^ ) = ( δg-1(x) δx ) η se(η) + Ifobs Var(y)$
where
 η = o + Xβ ,   se(η) = diagsqrt(XCXT) , $η=o+Xβ , se(η) = diag⁡XCXT ,$
o$o$ is a vector of offsets and Ifobs = 0${I}_{\mathrm{fobs}}=0$, if the variance of future observations is not taken into account, and 1$1$ otherwise. Here diagA$\mathrm{diag}A$ indicates the diagonal elements of matrix A$A$.
If required, the variance for the i$i$th future observation, Var(yi)$\mathrm{Var}\left({y}_{i}\right)$, can be calculated as:
 Var(yi) = (φ V(θ))/(wi) $Var(yi) = ϕ V(θ) wi$
where wi${w}_{i}$ is a weight, φ$\varphi$ is the scale (or dispersion) parameter, and V(θ)$V\left(\theta \right)$ is the variance function. Both the scale parameter and the variance function depend on the distribution used for the y$y$, with:
 Poisson V(θ) = μi$V\left(\theta \right)={\mu }_{i}$, φ = 1$\varphi =1$ binomial V(θ) = (μi(ti − μi))/(ti)$V\left(\theta \right)=\frac{{\mu }_{i}\left({t}_{i}-{\mu }_{i}\right)}{{t}_{i}}$, φ = 1$\varphi =1$ Normal V(θ) = 1$V\left(\theta \right)=1$ gamma V(θ) = μi2$V\left(\theta \right)={\mu }_{i}^{2}$
In the cases of a Normal and gamma error structure, the scale parameter (φ$\varphi$), is supplied by you. This value is usually obtained from the function used to fit the prediction model. In many cases, for a Normal error structure, φ=σ̂2$\varphi ={\stackrel{^}{\sigma }}^{2}$, i.e., the estimated variance.

## References

McCullagh P and Nelder J A (1983) Generalized Linear Models Chapman and Hall

## Parameters

### Compulsory Input Parameters

1:     errfn – string (length ≥ 1)
Indicates the distribution used to model the dependent variable, y$y$.
errfn = 'B'${\mathbf{errfn}}=\text{'B'}$
The binomial distribution is used.
errfn = 'G'${\mathbf{errfn}}=\text{'G'}$
The gamma distribution is used.
errfn = 'N'${\mathbf{errfn}}=\text{'N'}$
The Normal (Gaussian) distribution is used.
errfn = 'P'${\mathbf{errfn}}=\text{'P'}$
The Poisson distribution is used.
Constraint: errfn = 'B'${\mathbf{errfn}}=\text{'B'}$, 'G'$\text{'G'}$, 'N'$\text{'N'}$ or 'P'$\text{'P'}$.
Indicates which link function to be used.
link = 'C'${\mathbf{link}}=\text{'C'}$
A complementary log-log link is used.
link = 'E'${\mathbf{link}}=\text{'E'}$
link = 'G'${\mathbf{link}}=\text{'G'}$
link = 'I'${\mathbf{link}}=\text{'I'}$
link = 'L'${\mathbf{link}}=\text{'L'}$
link = 'P'${\mathbf{link}}=\text{'P'}$
link = 'R'${\mathbf{link}}=\text{'R'}$
link = 'S'${\mathbf{link}}=\text{'S'}$
A square root link is used.
Details on the functional form of the different links can be found in the G02 Chapter Introduction.
Constraints:
• if errfn = 'B'${\mathbf{errfn}}=\text{'B'}$, link = 'C'${\mathbf{link}}=\text{'C'}$, 'G'$\text{'G'}$ or 'P'$\text{'P'}$;
• otherwise link = 'E'${\mathbf{link}}=\text{'E'}$, 'I'$\text{'I'}$, 'L'$\text{'L'}$, 'R'$\text{'R'}$ or 'S'$\text{'S'}$.
3:     mean – string (length ≥ 1)
Indicates if a mean term is to be included.
mean = 'M'${\mathbf{mean}}=\text{'M'}$
A mean term, intercept, will be included in the model.
mean = 'Z'${\mathbf{mean}}=\text{'Z'}$
The model will pass through the origin, zero-point.
Constraint: mean = 'M'${\mathbf{mean}}=\text{'M'}$ or 'Z'$\text{'Z'}$.
4:     x(ldx,m) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxn$\mathit{ldx}\ge {\mathbf{n}}$.
x(i,j)${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the i$\mathit{i}$th observation for the j$\mathit{j}$th independent variable, for i = 1,2,,n$\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and j = 1,2,,m$\mathit{j}=1,2,\dots ,{\mathbf{m}}$.
5:     isx(m) – int64int32nag_int array
m, the dimension of the array, must satisfy the constraint m1${\mathbf{m}}\ge 1$.
Indicates which independent variables are to be included in the model.
If isx(j) > 0${\mathbf{isx}}\left(j\right)>0$, the j$j$th independent variable is included in the regression model.
Constraints:
• isx(j)0${\mathbf{isx}}\left(j\right)\ge 0$, for i = 1,2,,m$\mathit{i}=1,2,\dots ,{\mathbf{m}}$;
• if mean = 'M'${\mathbf{mean}}=\text{'M'}$, exactly ip1${\mathbf{ip}}-1$ values of isx must be > 0$\text{}>0$;
• if mean = 'Z'${\mathbf{mean}}=\text{'Z'}$, exactly ip values of isx must be > 0$\text{}>0$.
6:     b(ip) – double array
ip, the dimension of the array, must satisfy the constraint ip > 0${\mathbf{ip}}>0$.
The model parameters, β$\beta$.
If mean = 'M'${\mathbf{mean}}=\text{'M'}$, b(1)${\mathbf{b}}\left(1\right)$ must contain the mean parameter and b(i + 1)${\mathbf{b}}\left(i+1\right)$ the coefficient of the variable contained in the j$j$th independent x, where isx(j)${\mathbf{isx}}\left(j\right)$ is the i$i$th positive value in the array isx.
If mean = 'Z'${\mathbf{mean}}=\text{'Z'}$, b(i)${\mathbf{b}}\left(i\right)$ must contain the coefficient of the variable contained in the j$j$th independent x, where isx(j)${\mathbf{isx}}\left(j\right)$ is the i$i$th positive value in the array isx.
7:     cov(ip × (ip + 1) / 2${\mathbf{ip}}×\left({\mathbf{ip}}+1\right)/2$) – double array
The upper triangular part of the variance-covariance matrix, C$C$, of the model parameters. This matrix should be supplied packed by column, i.e., the covariance between parameters βi${\beta }_{i}$ and βj${\beta }_{j}$, that is the values stored in b(i)${\mathbf{b}}\left(i\right)$ and b(j)${\mathbf{b}}\left(j\right)$, should be supplied in cov(j × (j1) / 2 + i)${\mathbf{cov}}\left(\mathit{j}×\left(\mathit{j}-1\right)/2+\mathit{i}\right)$, for i = 1,2,,ip$\mathit{i}=1,2,\dots ,{\mathbf{ip}}$ and j = i,,ip$\mathit{j}=\mathit{i},\dots ,{\mathbf{ip}}$.
Constraint: the matrix represented in cov must be a valid variance-covariance matrix.
8:     vfobs – logical scalar
If vfobs = true${\mathbf{vfobs}}=\mathbf{true}$, the variance of future observations is included in the standard error of the predicted variable (i.e., Ifobs = 1${I}_{\mathrm{fobs}}=1$), otherwise Ifobs = 0${I}_{\mathrm{fobs}}=0$.

### Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the arrays t, off, wt and the first dimension of the array x. (An error is raised if these dimensions are not equal.)
n$n$, the number of observations.
Constraint: n1${\mathbf{n}}\ge 1$.
2:     m – int64int32nag_int scalar
Default: The dimension of the array isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
m$m$, the total number of independent variables.
Constraint: m1${\mathbf{m}}\ge 1$.
3:     ip – int64int32nag_int scalar
Default: The dimension of the array b.
The number of independent variables in the model, including the mean or intercept if present.
Constraint: ip > 0${\mathbf{ip}}>0$.
4:     t( : $:$) – double array
Note: the dimension of the array must be at least n${\mathbf{n}}$ if errfn = 'B'${\mathbf{errfn}}=\text{'B'}$, and at least 1$1$ otherwise.
If errfn = 'B'${\mathbf{errfn}}=\text{'B'}$, t(i)${\mathbf{t}}\left(i\right)$ must contain the binomial denominator, ti${t}_{i}$, for the i$i$th observation.
Otherwise t is not referenced.
Default: 1$1$
Constraint: if errfn = 'B'${\mathbf{errfn}}=\text{'B'}$, t(i)0.0${\mathbf{t}}\left(\mathit{i}\right)\ge 0.0$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
5:     off( : $:$) – double array
Note: the dimension of the array must be at least n${\mathbf{n}}$ if offset = 'Y'$\mathit{offset}=\text{'Y'}$, and at least 1$1$ otherwise.
If offset = 'Y'$\mathit{offset}=\text{'Y'}$, off(i)${\mathbf{off}}\left(i\right)$ must contain the offset oi${o}_{i}$, for the i$i$th observation.
Otherwise off is not referenced.
6:     wt( : $:$) – double array
Note: the dimension of the array must be at least n${\mathbf{n}}$ if weight = 'W'$\mathit{weight}=\text{'W'}$ and vfobs = true${\mathbf{vfobs}}=\mathbf{true}$, and at least 1$1$ otherwise.
If weight = 'W'$\mathit{weight}=\text{'W'}$ and vfobs = true${\mathbf{vfobs}}=\mathbf{true}$, wt(i)${\mathbf{wt}}\left(i\right)$ must contain the weight, wi${w}_{i}$, for the i$i$th observation.
If the variance of future observations is not included in the standard error of the predicted variable, wt is not referenced.
Constraint: if vfobs = true${\mathbf{vfobs}}=\mathbf{true}$ and weight = 'W'$\mathit{weight}=\text{'W'}$, wt(i)0${\mathbf{wt}}\left(\mathit{i}\right)\ge 0$., for i = 1,2,,i$\mathit{i}=1,2,\dots ,\mathit{i}$.
7:     s – double scalar
If errfn = 'N'${\mathbf{errfn}}=\text{'N'}$ or 'G'$\text{'G'}$ and vfobs = true${\mathbf{vfobs}}=\mathbf{true}$, the scale parameter, φ$\varphi$.
Otherwise s is not referenced and φ = 1$\varphi =1$.
Default: 0$0$
Constraint: if errfn = 'N'${\mathbf{errfn}}=\text{'N'}$ or 'G'$\text{'G'}$ and vfobs = true${\mathbf{vfobs}}=\mathbf{true}$, s > 0.0${\mathbf{s}}>0.0$.
8:     a – double scalar
If link = 'E'${\mathbf{link}}=\text{'E'}$, a must contain the power of the exponential.
If link'E'${\mathbf{link}}\ne \text{'E'}$, a is not referenced.
Default: 0$0$
Constraint: if link = 'E'${\mathbf{link}}=\text{'E'}$, a0.0${\mathbf{a}}\ne 0.0$.

### Input Parameters Omitted from the MATLAB Interface

offset weight ldx

### Output Parameters

1:     eta(n) – double array
The linear predictor, η$\eta$.
2:     seeta(n) – double array
The standard error of the linear predictor, se(η)$\mathrm{se}\left(\eta \right)$.
3:     pred(n) – double array
The predicted value, $\stackrel{^}{y}$.
4:     sepred(n) – double array
The standard error of the predicted value, se()$\mathrm{se}\left(\stackrel{^}{y}\right)$. If pred(i)${\mathbf{pred}}\left(i\right)$ could not be calculated, then nag_correg_glm_predict (g02gp) returns ${\mathbf{ifail}}={\mathbf{22}}$, and sepred(i)${\mathbf{sepred}}\left(i\right)$ is set to 99.0$-99.0$.
5:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Note: nag_correg_glm_predict (g02gp) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

ifail = 1${\mathbf{ifail}}=1$
On entry, errfn'B'${\mathbf{errfn}}\ne \text{'B'}$, 'P'$\text{'P'}$, 'G'$\text{'G'}$ or 'N'$\text{'N'}$.
ifail = 2${\mathbf{ifail}}=2$
On entry, errfn = 'B'${\mathbf{errfn}}=\text{'B'}$ and link'G'${\mathbf{link}}\ne \text{'G'}$, 'P'$\text{'P'}$ or 'C'$\text{'C'}$ or errfn'B'${\mathbf{errfn}}\ne \text{'B'}$ and link'E'${\mathbf{link}}\ne \text{'E'}$, 'I'$\text{'I'}$, 'L'$\text{'L'}$, 'R'$\text{'R'}$ or 'S'$\text{'S'}$.
ifail = 3${\mathbf{ifail}}=3$
On entry, mean'M'${\mathbf{mean}}\ne \text{'M'}$ or 'Z'$\text{'Z'}$.
ifail = 4${\mathbf{ifail}}=4$
On entry, offset'Y'$\mathit{offset}\ne \text{'Y'}$ or 'N'$\text{'N'}$.
ifail = 5${\mathbf{ifail}}=5$
On entry, vfobs = true${\mathbf{vfobs}}=\mathbf{true}$ and weight'U'$\mathit{weight}\ne \text{'U'}$ or 'W'$\text{'W'}$.
ifail = 6${\mathbf{ifail}}=6$
On entry, n < 1${\mathbf{n}}<1$.
ifail = 8${\mathbf{ifail}}=8$
On entry, ldx < n$\mathit{ldx}<{\mathbf{n}}$.
ifail = 9${\mathbf{ifail}}=9$
On entry, m0${\mathbf{m}}\le 0$.
ifail = 10${\mathbf{ifail}}=10$
On entry, number of nonzero elements in isx is not consistent with ip.
ifail = 11${\mathbf{ifail}}=11$
On entry, ip < 1${\mathbf{ip}}<1$.
ifail = 12${\mathbf{ifail}}=12$
On entry, errfn = 'B'${\mathbf{errfn}}=\text{'B'}$ and t(i) < 0.0${\mathbf{t}}\left(i\right)<0.0$ for at least one i = 1,2,,n$i=1,2,\dots ,n$.
ifail = 14${\mathbf{ifail}}=14$
On entry, vfobs = true${\mathbf{vfobs}}=\mathbf{true}$, weight = 'W'$\mathit{weight}=\text{'W'}$ and wt(i) < 0.0${\mathbf{wt}}\left(i\right)<0.0$ for at least one i = 1,2,,n$i=1,2,\dots ,n$.
ifail = 15${\mathbf{ifail}}=15$
On entry, vfobs = true${\mathbf{vfobs}}=\mathbf{true}$, errfn = 'G'${\mathbf{errfn}}=\text{'G'}$ or 'N'$\text{'N'}$ and s0.0${\mathbf{s}}\le 0.0$.
ifail = 16${\mathbf{ifail}}=16$
On entry, link = 'E'${\mathbf{link}}=\text{'E'}$ and a = 0.0${\mathbf{a}}=0.0$.
ifail = 18${\mathbf{ifail}}=18$
On entry, supplied covariance matrix has at least one diagonal element < 0.0$\text{}<0.0$.
W ifail = 22${\mathbf{ifail}}=22$
On exit, at least one predicted value could not be calculated as required. sepred is set to 99.0$-99.0$ for affected predicted values.

Not applicable.

None.

## Example

```function nag_correg_glm_predict_example
errfn = 'N';
mean_p = 'M';
x = [1; 2; 3; 4; 5];
y = [25; 10; 6; 4; 3];
isx = [int64(1)];
s = 0;
vfobs = true;
ip = int64(2);

% Call routine to fit model to training data
[s, rss, idf, b, irank, se, covar, vOut, ifail] = ...
nag_correg_glm_normal(link, mean_p, x, isx, ip, y, s);

if (ifail == 0)
% Display parameter estimates for training data
fprintf('\nResidual sum of squares =  %12.4e  Degrees of freedom =  %d\n', rss, idf)
fprintf('\n      Estimate     Standard error\n');
for i = 1:2
fprintf('%14.4f %14.4f\n', b(i), se(i));
end
x = [32; 18];
end

[eta, seeta, pred, sepred, ifail] = ...
nag_correg_glm_predict(errfn, link, mean_p, x, isx, b, covar, vfobs, 's', s);

if (ifail == 0)
% Display predicted values
fprintf('\n  I      ETA          SE(ETA)      Predicted    SE(Predicted)\n');
for i = 1:2
fprintf('%3d%13.5f%13.5f%13.5f%13.5f\n', i, eta(i), seeta(i), pred(i), sepred(i));
end
end
```
```

Residual sum of squares =    3.8717e-01  Degrees of freedom =  3

Estimate     Standard error
-0.0239         0.0028
0.0638         0.0026

I      ETA          SE(ETA)      Predicted    SE(Predicted)
1      2.01807      0.08168      0.49552      0.35981
2      1.12472      0.04476      0.88911      0.36098

```
```function g02gp_example
errfn = 'N';
mean_p = 'M';
x = [1; 2; 3; 4; 5];
y = [25; 10; 6; 4; 3];
isx = [int64(1)];
s = 0;
vfobs = true;
ip = int64(2);

% Call routine to fit model to training data
[s, rss, idf, b, irank, se, covar, vOut, ifail] = ...
g02ga(link, mean_p, x, isx, ip, y, s);

if (ifail == 0)
% Display parameter estimates for training data
fprintf('\nResidual sum of squares =  %12.4e  Degrees of freedom =  %d\n', rss, idf)
fprintf('\n      Estimate     Standard error\n');
for i = 1:2
fprintf('%14.4f %14.4f\n', b(i), se(i));
end
x = [32; 18];
end

[eta, seeta, pred, sepred, ifail] = ...
g02gp(errfn, link, mean_p, x, isx, b, covar, vfobs, 's', s);

if (ifail == 0)
% Display predicted values
fprintf('\n  I      ETA          SE(ETA)      Predicted    SE(Predicted)\n');
for i = 1:2
fprintf('%3d%13.5f%13.5f%13.5f%13.5f\n', i, eta(i), seeta(i), pred(i), sepred(i));
end
end
```
```

Residual sum of squares =    3.8717e-01  Degrees of freedom =  3

Estimate     Standard error
-0.0239         0.0028
0.0638         0.0026

I      ETA          SE(ETA)      Predicted    SE(Predicted)
1      2.01807      0.08168      0.49552      0.35981
2      1.12472      0.04476      0.88911      0.36098

```