g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

# NAG Library Function Documentnag_regsn_ridge_opt (g02kac)

## 1  Purpose

nag_regsn_ridge_opt (g02kac) calculates a ridge regression, optimizing the ridge parameter according to one of four prediction error criteria.

## 2  Specification

 #include #include
 void nag_regsn_ridge_opt (Nag_OrderType order, Integer n, Integer m, const double x[], Integer pdx, const Integer isx[], Integer ip, double tau, const double y[], double *h, Nag_PredictError opt, Integer *niter, double tol, double *nep, Nag_EstimatesOption orig, double b[], double vif[], double res[], double *rss, Integer *df, Nag_OptionLOO optloo, double perr[], NagError *fail)

## 3  Description

A linear model has the form:
 $y = c+Xβ+ε ,$
where
• $y$ is an $n$ by $1$ matrix of values of a dependent variable;
• $c$ is a scalar intercept term;
• $X$ is an $n$ by $m$ matrix of values of independent variables;
• $\beta$ is an $m$ by $1$ matrix of unknown values of parameters;
• $\epsilon$ is an $n$ by $1$ matrix of unknown random errors such that variance of $\epsilon ={\sigma }^{2}I$.
Let $\stackrel{~}{X}$ be the mean-centred $X$ and $\stackrel{~}{y}$ the mean-centred $y$. Furthermore, $\stackrel{~}{X}$ is scaled such that the diagonal elements of the cross product matrix ${\stackrel{~}{X}}^{\mathrm{T}}\stackrel{~}{X}$ are one. The linear model now takes the form:
 $y~ = X~ β~ + ε .$
Ridge regression estimates the parameters $\stackrel{~}{\beta }$ in a penalised least squares sense by finding the $\stackrel{~}{b}$ that minimizes
 $X~ b~ - y~ 2 + h b~ 2 , h>0 ,$
where $‖·‖$ denotes the ${\ell }_{2}$-norm and $h$ is a scalar regularisation or ridge parameter. For a given value of $h$, the parameter estimates $\stackrel{~}{b}$ are found by evaluating
 $b~ = X~T X~+hI -1 X~T y~ .$
Note that if $h=0$ the ridge regression solution is equivalent to the ordinary least squares solution.
Rather than calculate the inverse of (${\stackrel{~}{X}}^{\mathrm{T}}\stackrel{~}{X}+hI$) directly, nag_regsn_ridge_opt (g02kac) uses the singular value decomposition (SVD) of $\stackrel{~}{X}$. After decomposing $\stackrel{~}{X}$ into $UD{V}^{\mathrm{T}}$ where $U$ and $V$ are orthogonal matrices and $D$ is a diagonal matrix, the parameter estimates become
 $b~ = V DTD+hI -1 D UT y~ .$
A consequence of introducing the ridge parameter is that the effective number of parameters, $\gamma$, in the model is given by the sum of diagonal elements of
 $DT D DT D+hI-1 ,$
see Moody (1992) for details.
Any multi-collinearity in the design matrix $X$ may be highlighted by calculating the variance inflation factors for the fitted model. The $j$th variance inflation factor, ${v}_{j}$, is a scaled version of the multiple correlation coefficient between independent variable $j$ and the other independent variables, ${R}_{j}$, and is given by
 $vj = 1 1-Rj , j=1,2,…,m .$
The $m$ variance inflation factors are calculated as the diagonal elements of the matrix:
 $X~T X~+hI -1 X~T X~ X~T X~+hI -1 ,$
which, using the SVD of $\stackrel{~}{X}$, is equivalent to the diagonal elements of the matrix:
 $V DTD+hI -1 DT D DTD+hI -1 VT .$
Although parameter estimates $\stackrel{~}{b}$ are calculated by using $\stackrel{~}{X}$, it is usual to report the parameter estimates $b$ associated with $X$. These are calculated from $\stackrel{~}{b}$, and the means and scalings of $X$. Optionally, either $\stackrel{~}{b}$ or $b$ may be calculated.
The method can adopt one of four criteria to minimize while calculating a suitable value for $h$:
(a) Generalised cross-validation (GCV):
 $ns n-γ 2 ;$
(b) Unbiased estimate of variance (UEV):
 $s n-γ ;$
(c) Future prediction error (FPE):
 $1n s+ 2γs n-γ ;$
(d) Bayesian information criterion (BIC):
 $1n s + lognγs n-γ ;$
where $s$ is the sum of squares of residuals. However, the function returns all four of the above prediction errors regardless of the one selected to minimize the ridge parameter, $h$. Furthermore, the function will optionally return the leave-one-out cross-validation (LOOCV) prediction error.

## 4  References

Hastie T, Tibshirani R and Friedman J (2003) The Elements of Statistical Learning: Data Mining, Inference and Prediction Springer Series in Statistics
Moody J.E. (1992) The effective number of parameters: An analysis of generalisation and regularisation in nonlinear learning systems In: Neural Information Processing Systems (eds J E Moody, S J Hanson, and R P Lippmann) 4 847–854 Morgan Kaufmann San Mateo CA

## 5  Arguments

1:     orderNag_OrderTypeInput
On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by ${\mathbf{order}}=\mathrm{Nag_RowMajor}$. See Section 3.2.1.3 in the Essential Introduction for a more detailed explanation of the use of this argument.
Constraint: ${\mathbf{order}}=\mathrm{Nag_RowMajor}$ or Nag_ColMajor.
2:     nIntegerInput
On entry: $n$, the number of observations.
Constraint: ${\mathbf{n}}>1$.
3:     mIntegerInput
On entry: the number of independent variables available in the data matrix $X$.
Constraint: ${\mathbf{m}}\le {\mathbf{n}}$.
4:     x[$\mathit{dim}$]const doubleInput
Note: the dimension, dim, of the array x must be at least
• $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{pdx}}×{\mathbf{m}}\right)$ when ${\mathbf{order}}=\mathrm{Nag_ColMajor}$;
• $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{n}}×{\mathbf{pdx}}\right)$ when ${\mathbf{order}}=\mathrm{Nag_RowMajor}$.
The $\left(i,j\right)$th element of the matrix $X$ is stored in
• ${\mathbf{x}}\left[\left(j-1\right)×{\mathbf{pdx}}+i-1\right]$ when ${\mathbf{order}}=\mathrm{Nag_ColMajor}$;
• ${\mathbf{x}}\left[\left(i-1\right)×{\mathbf{pdx}}+j-1\right]$ when ${\mathbf{order}}=\mathrm{Nag_RowMajor}$.
On entry: the values of independent variables in the data matrix $X$.
5:     pdxIntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array x.
Constraints:
• if ${\mathbf{order}}=\mathrm{Nag_ColMajor}$, ${\mathbf{pdx}}\ge {\mathbf{n}}$;
• if ${\mathbf{order}}=\mathrm{Nag_RowMajor}$, ${\mathbf{pdx}}\ge {\mathbf{m}}$.
6:     isx[m]const IntegerInput
On entry: indicates which $m$ independent variables are included in the model.
${\mathbf{isx}}\left[j-1\right]=1$
The $j$th variable in x will be included in the model.
${\mathbf{isx}}\left[j-1\right]=0$
Variable $j$ is excluded.
Constraint: ${\mathbf{isx}}\left[\mathit{j}-1\right]=0\text{​ or ​}1$, for $\mathit{j}=1,2,\dots ,{\mathbf{m}}$.
7:     ipIntegerInput
On entry: $m$, the number of independent variables in the model.
Constraints:
• $1\le {\mathbf{ip}}\le {\mathbf{m}}$;
• Exactly ip elements of isx must be equal to $1$.
8:     taudoubleInput
On entry: singular values less than tau of the SVD of the data matrix $X$ will be set equal to zero.
Suggested value: ${\mathbf{tau}}=0.0$
Constraint: ${\mathbf{tau}}\ge 0.0$.
9:     y[n]const doubleInput
On entry: the $n$ values of the dependent variable $y$.
10:   hdouble *Input/Output
On entry: an initial value for the ridge regression parameter $h$; used as a starting point for the optimization.
Constraint: ${\mathbf{h}}>0.0$.
On exit: h is the optimized value of the ridge regression parameter $h$.
11:   optNag_PredictErrorInput
On entry: the measure of prediction error used to optimize the ridge regression parameter $h$. The value of opt must be set equal to one of:
${\mathbf{opt}}=\mathrm{Nag_GCV}$
Generalised cross-validation (GCV);
${\mathbf{opt}}=\mathrm{Nag_UEV}$
Unbiased estimate of variance (UEV)
${\mathbf{opt}}=\mathrm{Nag_FPE}$
Future prediction error (FPE)
${\mathbf{opt}}=\mathrm{Nag_BIC}$
Bayesian information criteron (BIC).
Constraint: ${\mathbf{opt}}=\mathrm{Nag_GCV}$, $\mathrm{Nag_UEV}$, $\mathrm{Nag_FPE}$ or $\mathrm{Nag_BIC}$.
12:   niterInteger *Input/Output
On entry: the maximum number of iterations allowed to optimize the ridge regression parameter $h$.
Constraint: ${\mathbf{niter}}\ge 1$.
On exit: the number of iterations used to optimize the ridge regression parameter $h$ within tol.
13:   toldoubleInput
On entry: iterations of the ridge regression parameter $h$ will halt when consecutive values of $h$ lie within tol.
Constraint: ${\mathbf{tol}}>0.0$.
14:   nepdouble *Output
On exit: the number of effective parameters, $\gamma$, in the model.
15:   origNag_EstimatesOptionInput
On entry: if ${\mathbf{orig}}=\mathrm{Nag_EstimatesOrig}$, the parameter estimates $b$ are calculated for the original data; otherwise ${\mathbf{orig}}=\mathrm{Nag_EstimatesStand}$ and the parameter estimates $\stackrel{~}{b}$ are calculated for the standardized data.
Constraint: ${\mathbf{orig}}=\mathrm{Nag_EstimatesOrig}$ or $\mathrm{Nag_EstimatesStand}$.
16:   b[${\mathbf{ip}}+1$]doubleOutput
On exit: contains the intercept and parameter estimates for the fitted ridge regression model in the order indicated by isx. The first element of b contains the estimate for the intercept; ${\mathbf{b}}\left[\mathit{j}\right]$ contains the parameter estimate for the $\mathit{j}$th independent variable in the model, for $\mathit{j}=1,2,\dots ,{\mathbf{ip}}$.
17:   vif[ip]doubleOutput
On exit: the variance inflation factors in the order indicated by isx. For the $\mathit{j}$th independent variable in the model, ${\mathbf{vif}}\left[\mathit{j}-1\right]$ is the value of ${v}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{\mathbf{ip}}$.
18:   res[n]doubleOutput
On exit: ${\mathbf{res}}\left[\mathit{i}-1\right]$ is the value of the $\mathit{i}$th residual for the fitted ridge regression model, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$.
On exit: the sum of squares of residual values.
20:   dfInteger *Output
On exit: the degrees of freedom for the residual sum of squares rss.
21:   optlooNag_OptionLOOInput
On entry: if ${\mathbf{optloo}}=\mathrm{Nag_WantLOO}$, the leave-one-out cross-validation estimate of prediction error is calculated; otherwise no such estimate is calculated and ${\mathbf{optloo}}=\mathrm{Nag_NoLOO}$.
Constraint: ${\mathbf{optloo}}=\mathrm{Nag_NoLOO}$ or $\mathrm{Nag_WantLOO}$.
22:   perr[$5$]doubleOutput
On exit: the first four elements contain, in this order, the measures of prediction error: GCV, UEV, FPE and BIC.
If ${\mathbf{optloo}}=\mathrm{Nag_WantLOO}$, ${\mathbf{perr}}\left[4\right]$ is the LOOCV estimate of prediction error; otherwise ${\mathbf{perr}}\left[4\right]$ is not referenced.
23:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

NE_2_INT_ARG_CONS
On entry, ${\mathbf{ip}}=〈\mathit{\text{value}}〉$; ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
Constraint: $1\le {\mathbf{ip}}\le {\mathbf{m}}$.
NE_ALLOC_FAIL
Dynamic memory allocation failed.
On entry, argument $〈\mathit{\text{value}}〉$ had an illegal value.
NE_INT
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}>1$.
On entry, ${\mathbf{niter}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{niter}}\ge 1$.
On entry, ${\mathbf{pdx}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{pdx}}>0$.
NE_INT_2
On entry, ${\mathbf{m}}=〈\mathit{\text{value}}〉$ and ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{m}}\le {\mathbf{n}}$.
On entry, ${\mathbf{pdx}}=〈\mathit{\text{value}}〉$; ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{pdx}}\ge {\mathbf{n}}$.
On entry, ${\mathbf{pdx}}=〈\mathit{\text{value}}〉$ and ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{pdx}}\ge {\mathbf{m}}$.
NE_INT_ARG_CONS
On entry, ${\mathbf{ip}}=〈\mathit{\text{value}}〉$.
Constraint: $\mathrm{sum}\left({\mathbf{isx}}\right)={\mathbf{ip}}$.
NE_INT_ARRAY_VAL_1_OR_2
On entry, ${\mathbf{isx}}\left[〈\mathit{\text{value}}〉\right]=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{isx}}\left[j-1\right]=0$ or $1$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL
On entry, ${\mathbf{h}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{h}}>0.0$.
On entry, ${\mathbf{tau}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{tau}}\ge 0.0$.
On entry, ${\mathbf{tol}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{tol}}>0.0$.
NE_SVD_FAIL
SVD failed to converge.
NW_TOO_MANY_ITER
Maximum number of iterations used.

## 7  Accuracy

Not applicable.

nag_regsn_ridge_opt (g02kac) allocates internally $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(5×\left({\mathbf{n}}-1\right),2×{\mathbf{ip}}×{\mathbf{ip}}\right)+\left({\mathbf{n}}+3\right)×{\mathbf{ip}}+{\mathbf{n}}$ elements of double precision storage.

## 9  Example

This example reads in data from an experiment to model body fat, and a ridge regression is calculated that optimizes GCV prediction error.

### 9.1  Program Text

Program Text (g02kace.c)

### 9.2  Program Data

Program Data (g02kace.d)

### 9.3  Program Results

Program Results (g02kace.r)