hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_glm_predict (g02gp)

Purpose

nag_correg_glm_predict (g02gp) allows prediction from a generalized linear model fit via nag_correg_glm_normal (g02ga), nag_correg_glm_binomial (g02gb), nag_correg_glm_poisson (g02gc) or nag_correg_glm_gamma (g02gd).

Syntax

[eta, seeta, pred, sepred, ifail] = g02gp(errfn, link, mean, x, isx, b, cov, vfobs, 'n', n, 'm', m, 'ip', ip, 't', t, 'off', off, 'wt', wt, 's', s, 'a', a)
[eta, seeta, pred, sepred, ifail] = nag_correg_glm_predict(errfn, link, mean, x, isx, b, cov, vfobs, 'n', n, 'm', m, 'ip', ip, 't', t, 'off', off, 'wt', wt, 's', s, 'a', a)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 23: wt, off, s, a now optional, weight & offset dropped from interface, t now optional, default to vector of 1s
.

Description

A generalized linear model consists of the following elements:
(i) A suitable distribution for the dependent variable yy.
(ii) A linear model, with linear predictor η = Xβη=Xβ, where XX is a matrix of independent variables and ββ a column vector of pp parameters.
(iii) A link function g(.)g(.) between the expected value of yy and the linear predictor, that is E(y) = μ = g(η)E(y)=μ=g(η).
In order to predict from a generalized linear model, that is estimate a value for the dependent variable, yy, given a set of independent variables XX, the matrix XX must be supplied, along with values for the parameters ββ and their associated variance-covariance matrix, CC. Suitable values for ββ and CC are usually estimated by first fitting the prediction model to a training dataset with known responses, using for example nag_correg_glm_normal (g02ga), nag_correg_glm_binomial (g02gb), nag_correg_glm_poisson (g02gc) or nag_correg_glm_gamma (g02gd). The predicted variable, and its standard error can then be obtained from:
= g1(η) ,   se() = sqrt( ((δg1(x))/(δx)) η ) se(η) + Ifobs Var(y)
y^ = g-1(η) ,   se( y^ ) = ( δg-1(x) δx ) η se(η) + Ifobs Var(y)
where
η = o + Xβ ,   se(η) = diagsqrt(XCXT) ,
η=o+Xβ ,   se(η) = diagXCXT ,
oo is a vector of offsets and Ifobs = 0Ifobs=0, if the variance of future observations is not taken into account, and 11 otherwise. Here diagAdiagA indicates the diagonal elements of matrix AA.
If required, the variance for the iith future observation, Var(yi)Var(yi), can be calculated as:
Var(yi) = (φ V(θ))/(wi)
Var(yi) = ϕ V(θ) wi
where wiwi is a weight, φϕ is the scale (or dispersion) parameter, and V(θ)V(θ) is the variance function. Both the scale parameter and the variance function depend on the distribution used for the yy, with:
Poisson V(θ) = μiV(θ)=μi, φ = 1ϕ=1
binomial V(θ) = (μi(tiμi))/(ti)V(θ)=μi(ti-μi)ti, φ = 1ϕ=1
Normal V(θ) = 1V(θ)=1
gamma V(θ) = μi2V(θ)=μi2
In the cases of a Normal and gamma error structure, the scale parameter (φϕ), is supplied by you. This value is usually obtained from the function used to fit the prediction model. In many cases, for a Normal error structure, φ=σ̂2ϕ=σ^2, i.e., the estimated variance.

References

McCullagh P and Nelder J A (1983) Generalized Linear Models Chapman and Hall

Parameters

Compulsory Input Parameters

1:     errfn – string (length ≥ 1)
Indicates the distribution used to model the dependent variable, yy.
errfn = 'B'errfn='B'
The binomial distribution is used.
errfn = 'G'errfn='G'
The gamma distribution is used.
errfn = 'N'errfn='N'
The Normal (Gaussian) distribution is used.
errfn = 'P'errfn='P'
The Poisson distribution is used.
Constraint: errfn = 'B'errfn='B', 'G''G', 'N''N' or 'P''P'.
Indicates which link function to be used.
link = 'C'link='C'
A complementary log-log link is used.
link = 'E'link='E'
An exponent link is used.
link = 'G'link='G'
A logistic link is used.
link = 'I'link='I'
An identity link is used.
link = 'L'link='L'
A log link is used.
link = 'P'link='P'
A probit link is used.
link = 'R'link='R'
A reciprocal link is used.
link = 'S'link='S'
A square root link is used.
Details on the functional form of the different links can be found in the G02 Chapter Introduction.
Constraints:
  • if errfn = 'B'errfn='B', link = 'C'link='C', 'G''G' or 'P''P';
  • otherwise link = 'E'link='E', 'I''I', 'L''L', 'R''R' or 'S''S'.
3:     mean – string (length ≥ 1)
Indicates if a mean term is to be included.
mean = 'M'mean='M'
A mean term, intercept, will be included in the model.
mean = 'Z'mean='Z'
The model will pass through the origin, zero-point.
Constraint: mean = 'M'mean='M' or 'Z''Z'.
4:     x(ldx,m) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxnldxn.
x(i,j)xij must contain the iith observation for the jjth independent variable, for i = 1,2,,ni=1,2,,n and j = 1,2,,mj=1,2,,m.
5:     isx(m) – int64int32nag_int array
m, the dimension of the array, must satisfy the constraint m1m1.
Indicates which independent variables are to be included in the model.
If isx(j) > 0isxj>0, the jjth independent variable is included in the regression model.
Constraints:
  • isx(j)0isxj0, for i = 1,2,,mi=1,2,,m;
  • if mean = 'M'mean='M', exactly ip1ip-1 values of isx must be > 0>0;
  • if mean = 'Z'mean='Z', exactly ip values of isx must be > 0>0.
6:     b(ip) – double array
ip, the dimension of the array, must satisfy the constraint ip > 0ip>0.
The model parameters, ββ.
If mean = 'M'mean='M', b(1)b1 must contain the mean parameter and b(i + 1)bi+1 the coefficient of the variable contained in the jjth independent x, where isx(j)isxj is the iith positive value in the array isx.
If mean = 'Z'mean='Z', b(i)bi must contain the coefficient of the variable contained in the jjth independent x, where isx(j)isxj is the iith positive value in the array isx.
7:     cov(ip × (ip + 1) / 2ip×(ip+1)/2) – double array
The upper triangular part of the variance-covariance matrix, CC, of the model parameters. This matrix should be supplied packed by column, i.e., the covariance between parameters βiβi and βjβj, that is the values stored in b(i)bi and b(j)bj, should be supplied in cov(j × (j1) / 2 + i)covj×(j-1)/2+i, for i = 1,2,,ipi=1,2,,ip and j = i,,ipj=i,,ip.
Constraint: the matrix represented in cov must be a valid variance-covariance matrix.
8:     vfobs – logical scalar
If vfobs = truevfobs=true, the variance of future observations is included in the standard error of the predicted variable (i.e., Ifobs = 1Ifobs=1), otherwise Ifobs = 0Ifobs=0.

Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the arrays t, off, wt and the first dimension of the array x. (An error is raised if these dimensions are not equal.)
nn, the number of observations.
Constraint: n1n1.
2:     m – int64int32nag_int scalar
Default: The dimension of the array isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
mm, the total number of independent variables.
Constraint: m1m1.
3:     ip – int64int32nag_int scalar
Default: The dimension of the array b.
The number of independent variables in the model, including the mean or intercept if present.
Constraint: ip > 0ip>0.
4:     t( : :) – double array
Note: the dimension of the array must be at least nn if errfn = 'B'errfn='B', and at least 11 otherwise.
If errfn = 'B'errfn='B', t(i)ti must contain the binomial denominator, titi, for the iith observation.
Otherwise t is not referenced.
Default: 11
Constraint: if errfn = 'B'errfn='B', t(i)0.0ti0.0, for i = 1,2,,ni=1,2,,n.
5:     off( : :) – double array
Note: the dimension of the array must be at least nn if offset = 'Y'offset='Y', and at least 11 otherwise.
If offset = 'Y'offset='Y', off(i)offi must contain the offset oioi, for the iith observation.
Otherwise off is not referenced.
6:     wt( : :) – double array
Note: the dimension of the array must be at least nn if weight = 'W'weight='W' and vfobs = truevfobs=true, and at least 11 otherwise.
If weight = 'W'weight='W' and vfobs = truevfobs=true, wt(i)wti must contain the weight, wiwi, for the iith observation.
If the variance of future observations is not included in the standard error of the predicted variable, wt is not referenced.
Constraint: if vfobs = truevfobs=true and weight = 'W'weight='W', wt(i)0wti0., for i = 1,2,,ii=1,2,,i.
7:     s – double scalar
If errfn = 'N'errfn='N' or 'G''G' and vfobs = truevfobs=true, the scale parameter, φϕ.
Otherwise s is not referenced and φ = 1ϕ=1.
Default: 00
Constraint: if errfn = 'N'errfn='N' or 'G''G' and vfobs = truevfobs=true, s > 0.0s>0.0.
8:     a – double scalar
If link = 'E'link='E', a must contain the power of the exponential.
If link'E'link'E', a is not referenced.
Default: 00
Constraint: if link = 'E'link='E', a0.0a0.0.

Input Parameters Omitted from the MATLAB Interface

offset weight ldx

Output Parameters

1:     eta(n) – double array
The linear predictor, ηη.
2:     seeta(n) – double array
The standard error of the linear predictor, se(η)se(η).
3:     pred(n) – double array
The predicted value, y^.
4:     sepred(n) – double array
The standard error of the predicted value, se()se(y^). If pred(i)predi could not be calculated, then nag_correg_glm_predict (g02gp) returns ifail = 22ifail=22, and sepred(i)sepredi is set to 99.0-99.0.
5:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Note: nag_correg_glm_predict (g02gp) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

  ifail = 1ifail=1
On entry, errfn'B'errfn'B', 'P''P', 'G''G' or 'N''N'.
  ifail = 2ifail=2
On entry, errfn = 'B'errfn='B' and link'G'link'G', 'P''P' or 'C''C' or errfn'B'errfn'B' and link'E'link'E', 'I''I', 'L''L', 'R''R' or 'S''S'.
  ifail = 3ifail=3
On entry, mean'M'mean'M' or 'Z''Z'.
  ifail = 4ifail=4
On entry, offset'Y'offset'Y' or 'N''N'.
  ifail = 5ifail=5
On entry, vfobs = truevfobs=true and weight'U'weight'U' or 'W''W'.
  ifail = 6ifail=6
On entry, n < 1n<1.
  ifail = 8ifail=8
On entry, ldx < nldx<n.
  ifail = 9ifail=9
On entry, m0m0.
  ifail = 10ifail=10
On entry, number of nonzero elements in isx is not consistent with ip.
  ifail = 11ifail=11
On entry, ip < 1ip<1.
  ifail = 12ifail=12
On entry, errfn = 'B'errfn='B' and t(i) < 0.0ti<0.0 for at least one i = 1,2,,ni=1,2,,n.
  ifail = 14ifail=14
On entry, vfobs = truevfobs=true, weight = 'W'weight='W' and wt(i) < 0.0wti<0.0 for at least one i = 1,2,,ni=1,2,,n.
  ifail = 15ifail=15
On entry, vfobs = truevfobs=true, errfn = 'G'errfn='G' or 'N''N' and s0.0s0.0.
  ifail = 16ifail=16
On entry, link = 'E'link='E' and a = 0.0a=0.0.
  ifail = 18ifail=18
On entry, supplied covariance matrix has at least one diagonal element < 0.0<0.0.
W ifail = 22ifail=22
On exit, at least one predicted value could not be calculated as required. sepred is set to 99.0-99.0 for affected predicted values.

Accuracy

Not applicable.

Further Comments

None.

Example

function nag_correg_glm_predict_example
errfn = 'N';
link = 'R';
mean_p = 'M';
x = [1; 2; 3; 4; 5];
y = [25; 10; 6; 4; 3];
isx = [int64(1)];
s = 0;
vfobs = true;
ip = int64(2);

% Call routine to fit model to training data
[s, rss, idf, b, irank, se, covar, vOut, ifail] = ...
     nag_correg_glm_normal(link, mean_p, x, isx, ip, y, s);

if (ifail == 0)
  % Display parameter estimates for training data
  fprintf('\nResidual sum of squares =  %12.4e  Degrees of freedom =  %d\n', rss, idf)
  fprintf('\n      Estimate     Standard error\n');
  for i = 1:2
    fprintf('%14.4f %14.4f\n', b(i), se(i));
  end
  x = [32; 18];
end

[eta, seeta, pred, sepred, ifail] = ...
     nag_correg_glm_predict(errfn, link, mean_p, x, isx, b, covar, vfobs, 's', s);

if (ifail == 0)
  % Display predicted values
  fprintf('\n  I      ETA          SE(ETA)      Predicted    SE(Predicted)\n');
  for i = 1:2
    fprintf('%3d%13.5f%13.5f%13.5f%13.5f\n', i, eta(i), seeta(i), pred(i), sepred(i));
  end
end
 

Residual sum of squares =    3.8717e-01  Degrees of freedom =  3

      Estimate     Standard error
       -0.0239         0.0028
        0.0638         0.0026

  I      ETA          SE(ETA)      Predicted    SE(Predicted)
  1      2.01807      0.08168      0.49552      0.35981
  2      1.12472      0.04476      0.88911      0.36098

function g02gp_example
errfn = 'N';
link = 'R';
mean_p = 'M';
x = [1; 2; 3; 4; 5];
y = [25; 10; 6; 4; 3];
isx = [int64(1)];
s = 0;
vfobs = true;
ip = int64(2);

% Call routine to fit model to training data
[s, rss, idf, b, irank, se, covar, vOut, ifail] = ...
     g02ga(link, mean_p, x, isx, ip, y, s);

if (ifail == 0)
  % Display parameter estimates for training data
  fprintf('\nResidual sum of squares =  %12.4e  Degrees of freedom =  %d\n', rss, idf)
  fprintf('\n      Estimate     Standard error\n');
  for i = 1:2
    fprintf('%14.4f %14.4f\n', b(i), se(i));
  end
  x = [32; 18];
end

[eta, seeta, pred, sepred, ifail] = ...
     g02gp(errfn, link, mean_p, x, isx, b, covar, vfobs, 's', s);

if (ifail == 0)
  % Display predicted values
  fprintf('\n  I      ETA          SE(ETA)      Predicted    SE(Predicted)\n');
  for i = 1:2
    fprintf('%3d%13.5f%13.5f%13.5f%13.5f\n', i, eta(i), seeta(i), pred(i), sepred(i));
  end
end
 

Residual sum of squares =    3.8717e-01  Degrees of freedom =  3

      Estimate     Standard error
       -0.0239         0.0028
        0.0638         0.0026

  I      ETA          SE(ETA)      Predicted    SE(Predicted)
  1      2.01807      0.08168      0.49552      0.35981
  2      1.12472      0.04476      0.88911      0.36098


PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013