NAG Library Function Document
nag_smooth_spline_fit (g10abc)
1 Purpose
nag_smooth_spline_fit (g10abc) fits a cubic smoothing spline for a given smoothing parameter.
2 Specification
#include <nag.h> 
#include <nagg10.h> 
void 
nag_smooth_spline_fit (Nag_SmoothFitType mode,
Integer n,
const double x[],
const double y[],
const double weights[],
double rho,
double yhat[],
double coeff[],
double *rss,
double *df,
double res[],
double h[],
double comm_ar[],
NagError *fail) 

3 Description
nag_smooth_spline_fit (g10abc) fits a cubic smoothing spline to a set of $n$ observations (${x}_{i}$, ${y}_{i}$), for $i=1,2,\dots ,n$. The spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is unsuitable.
Cubic smoothing splines arise as the unique realvalued solution function
$f$, with absolutely continuous first derivative and squaredintegrable second derivative, which minimizes:
where
${w}_{i}$ is the (optional) weight for the
$i$th observation and
$\rho $ is the smoothing parameter. This criterion consists of two parts: the first measures the fit of the curve, and the second the smoothness of the curve. The value of the smoothing parameter
$\rho $ weights these two aspects; larger values of
$\rho $ give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be estimated see
Hutchinson and de Hoog (1985) and
Reinsch (1967).
The fitted values,
$\hat{y}={\left({\hat{y}}_{1},{\hat{y}}_{2},\dots ,{\hat{y}}_{n}\right)}^{\mathrm{T}}$, and weighted residuals,
${r}_{i}$, can be written as
for a matrix
$H$. The residual degrees of freedom for the spline is
$\mathrm{trace}\left(IH\right)$ and the diagonal elements of
$H$,
${h}_{ii}$, are the leverages.
The parameter
$\rho $ can be chosen in a number of ways. The fit can be inspected for a number of different values of
$\rho $. Alternatively the degrees of freedom for the spline, which determines the value of
$\rho $, can be specified, or the (generalized) crossvalidation can be minimized to give
$\rho $; see
nag_smooth_spline_estim (g10acc) for further details.
nag_smooth_spline_fit (g10abc) requires the
${x}_{i}$ to be strictly increasing. If two or more observations have the same
${x}_{i}$value then they should be replaced by a single observation with
${y}_{i}$ equal to the (weighted) mean of the
$y$ values and weight,
${w}_{i}$, equal to the sum of the weights. This operation can be performed by
nag_order_data (g10zac).
The computation is split into three phases.
(i) 
Compute matrices needed to fit spline. 
(ii) 
Fit spline for a given value of $\rho $. 
(iii) 
Compute spline coefficients. 
When fitting the spline for several different values of
$\rho $, phase
(i) need only be carried out once and then phase
(ii) repeated for different values of
$\rho $. If the spline is being fitted as part of an iterative weighted least squares procedure phases
(i) and
(ii) have to be repeated for each set of weights. In either case, phase
(iii) will often only have to be performed after the final fit has been computed.
The algorithm is based on
Hutchinson (1986).
4 References
Hastie T J and Tibshirani R J (1990) Generalized Additive Models Chapman and Hall
Hutchinson M F (1986) Algorithm 642: A fast procedure for calculating minimum crossvalidation cubic smoothing splines ACM Trans. Math. Software 12 150–153
Hutchinson M F and de Hoog F R (1985) Smoothing noisy data with spline functions Numer. Math. 47 99–106
Reinsch C H (1967) Smoothing by spline functions Numer. Math. 10 177–183
5 Arguments
 1:
mode – Nag_SmoothFitTypeInput
On entry: indicates in which mode the function is to be used.
 ${\mathbf{mode}}=\mathrm{Nag\_SmoothFitPartial}$
 Initialization and fitting is performed. This partial fit can be used in an iterative weighted least squares context where the weights are changing at each call to nag_smooth_spline_fit (g10abc) or when the coefficients are not required.
 ${\mathbf{mode}}=\mathrm{Nag\_SmoothFitQuick}$
 Fitting only is performed. Initialization must have been performed previously by a call to nag_smooth_spline_fit (g10abc) with ${\mathbf{mode}}=\mathrm{Nag\_SmoothFitPartial}$. This quick fit may be called repeatedly with different values of rho without reinitialization.
 ${\mathbf{mode}}=\mathrm{Nag\_SmoothFitFull}$
 Initialization and full fitting is performed and the function coefficients are calculated.
Constraint:
${\mathbf{mode}}=\mathrm{Nag\_SmoothFitPartial}$, $\mathrm{Nag\_SmoothFitQuick}$ or $\mathrm{Nag\_SmoothFitFull}$.
 2:
n – IntegerInput
On entry:
$n$, the number of distinct observations.
Constraint:
${\mathbf{n}}\ge 3$.
 3:
x[n] – const doubleInput
On entry: the distinct and ordered values ${x}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
Constraint:
${\mathbf{x}}\left[\mathit{i}1\right]<{\mathbf{x}}\left[\mathit{i}\right]$, for $\mathit{i}=1,2,\dots ,n1$.
 4:
y[n] – const doubleInput

On entry: the values ${y}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
 5:
weights[n] – const doubleInput

On entry:
weights must contain the
$n$ weights, if they are required. Otherwise,
weights must be set to the null pointer
(double *)0.
Constraint:
if
weights are required, then
${\mathbf{weights}}\left[i1\right]>0.0$, for
$i=1,2,\dots ,n$.
 6:
rho – doubleInput
On entry: $\rho $, the smoothing parameter.
Constraint:
${\mathbf{rho}}\ge 0.0$.
 7:
yhat[n] – doubleOutput

On exit: the fitted values, ${\hat{y}}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
 8:
coeff[$\left({\mathbf{n}}1\right)\times 3$] – doubleInput/Output
On entry: if
${\mathbf{mode}}=\mathrm{Nag\_SmoothFitQuick}$,
coeff must be unaltered from the previous call to nag_smooth_spline_fit (g10abc) with
${\mathbf{mode}}=\mathrm{Nag\_SmoothFitPartial}$. Otherwise
coeff need not be set.
On exit: if
${\mathbf{mode}}=\mathrm{Nag\_SmoothFitFull}$,
coeff contains the spline coefficients. More precisely, the value of the spline at
$t$ is given by
$\left(\left({\mathbf{coeff}}\left[\left(i1\right)\times \left({\mathbf{n}}1\right)+2\right]\times \text{}d+{\mathbf{coeff}}\left[\left(i1\right)\times \left({\mathbf{n}}1\right)+1\right]\right)\times d+{\mathbf{coeff}}\left[\left(i1\right)\times \left({\mathbf{n}}1\right)+0\right]\right)\times d+{\hat{y}}_{i}$, where
${x}_{i}\le t<{x}_{i+1}$ and
$d=t{x}_{i}$.
If
${\mathbf{mode}}=\mathrm{Nag\_SmoothFitPartial}$ or
$\mathrm{Nag\_SmoothFitQuick}$,
coeff contains information that will be used in a subsequent call to nag_smooth_spline_fit (g10abc) with
${\mathbf{mode}}=\mathrm{Nag\_SmoothFitQuick}$.

On exit: the (weighted) residual sum of squares.
 10:
df – double *Output
On exit: the residual degrees of freedom.
 11:
res[n] – doubleOutput

On exit: the (weighted) residuals, ${r}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
 12:
h[n] – doubleOutput

On exit: the leverages, ${h}_{\mathit{i}\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
 13:
comm_ar[$9\times {\mathbf{n}}+14$] – doubleInput/Output
On entry: if
${\mathbf{mode}}=\mathrm{Nag\_SmoothFitQuick}$,
comm_ar must be unaltered from the previous call to nag_smooth_spline_fit (g10abc) with
${\mathbf{mode}}=\mathrm{Nag\_SmoothFitPartial}$. Otherwise
comm_ar is used as workspace and need not be set.
On exit: if
${\mathbf{mode}}=\mathrm{Nag\_SmoothFitPartial}$ or
$\mathrm{Nag\_SmoothFitQuick}$,
comm_ar contains information that will be used in a subsequent call to nag_smooth_spline_fit (g10abc) with
${\mathbf{mode}}=\mathrm{Nag\_SmoothFitQuick}$.
 14:
fail – NagError *Input/Output

The NAG error argument (see
Section 3.6 in the Essential Introduction).
6 Error Indicators and Warnings
 NE_BAD_PARAM
On entry, argument
mode had an illegal value.
 NE_INT_ARG_LT
On entry, ${\mathbf{n}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{n}}\ge 3$.
 NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
 NE_NOT_STRICTLY_INCREASING
The sequence
x is not strictly increasing:
${\mathbf{x}}\left[\u2329\mathit{\text{value}}\u232a\right]=\u2329\mathit{\text{value}}\u232a$,
${\mathbf{x}}\left[\u2329\mathit{\text{value}}\u232a\right]=\u2329\mathit{\text{value}}\u232a$.
 NE_REAL_ARG_LT
On entry, ${\mathbf{rho}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{rho}}\ge 0.0$.
 NE_REAL_ARRAY_CONS
On entry, ${\mathbf{weights}}\left[\u2329\mathit{\text{value}}\u232a\right]=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{weights}}\left[i1\right]>0$, for $i=1,2,\dots ,n$.
7 Accuracy
Accuracy depends on the value of $\rho $ and the position of the $x$ values. The values of ${x}_{i}{x}_{i1}$ and ${w}_{i}$ are scaled and $\rho $ is transformed to avoid underflow and overflow problems.
The time taken by nag_smooth_spline_fit (g10abc) is of order $n$.
Regression splines with a small
$\left(<n\right)$ number of knots can be fitted by
nag_1d_spline_fit_knots (e02bac) and
nag_1d_spline_fit (e02bec).
9 Example
The data, given by
Hastie and Tibshirani (1990), is the age,
${x}_{i}$, and Cpeptide concentration (pmol/ml),
${y}_{i}$, from a study of the factors affecting insulindependent diabetes mellitus in children. The data is input, reduced to a strictly ordered set by
nag_order_data (g10zac) and a series of splines fit using a range of values for the smoothing parameter,
$\rho $.
9.1 Program Text
Program Text (g10abce.c)
9.2 Program Data
Program Data (g10abce.d)
9.3 Program Results
Program Results (g10abce.r)