PDF version (NAG web site
, 64bit version, 64bit version)
NAG Toolbox: nag_fit_1dspline_auto (e02be)
Purpose
nag_fit_1dspline_auto (e02be) computes a cubic spline approximation to an arbitrary set of data points. The knots of the spline are located automatically, but a single parameter must be specified to control the tradeoff between closeness of fit and smoothness of fit.
Syntax
[
n,
lamda,
c,
fp,
wrk,
iwrk,
ifail] = e02be(
start,
x,
y,
w,
s,
n,
lamda,
wrk,
iwrk, 'm',
m, 'nest',
nest)
[
n,
lamda,
c,
fp,
wrk,
iwrk,
ifail] = nag_fit_1dspline_auto(
start,
x,
y,
w,
s,
n,
lamda,
wrk,
iwrk, 'm',
m, 'nest',
nest)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 22: lwrk has been removed from the interface
.
Description
nag_fit_1dspline_auto (e02be) determines a smooth cubic spline approximation s(x)$s\left(x\right)$ to the set of data points (x_{r},y_{r})$({x}_{\mathit{r}},{y}_{\mathit{r}})$, with weights w_{r}${w}_{\mathit{r}}$, for r = 1,2, … ,m$\mathit{r}=1,2,\dots ,m$.
The spline is given in the Bspline representation
where
N_{i}(x)${N}_{i}\left(x\right)$ denotes the normalized cubic Bspline defined upon the knots
λ_{i},λ_{i + 1}, … ,λ_{i + 4}${\lambda}_{i},{\lambda}_{i+1},\dots ,{\lambda}_{i+4}$.
The total number n
$n$ of these knots and their values λ_{1}, … ,λ_{n}${\lambda}_{1},\dots ,{\lambda}_{n}$ are chosen automatically by the function. The knots λ_{5}, … ,λ_{n − 4}${\lambda}_{5},\dots ,{\lambda}_{n4}$ are the interior knots; they divide the approximation interval [x_{1},x_{m}]$[{x}_{1},{x}_{m}]$ into n − 7$n7$ subintervals. The coefficients c_{1},c_{2}, … ,c_{n − 4}${c}_{1},{c}_{2},\dots ,{c}_{n4}$ are then determined as the solution of the following constrained minimization problem:
minimize
subject to the constraint
where 
δ_{i}${\delta}_{i}$ 
stands for the discontinuity jump in the third order derivative of s(x)$s\left(x\right)$ at the interior knot λ_{i}${\lambda}_{i}$, 

ε_{r}${\epsilon}_{r}$ 
denotes the weighted residual w_{r}(y_{r} − s(x_{r}))${w}_{r}({y}_{r}s\left({x}_{r}\right))$, 
and 
S
$S$ 
is a nonnegative number to be specified by you. 
The quantity
η$\eta $ can be seen as a measure of the (lack of) smoothness of
s(x)$s\left(x\right)$, while closeness of fit is measured through
θ$\theta $. By means of the parameter
S
$S$, ‘the smoothing factor’, you can then control the balance between these two (usually conflicting) properties. If
S
$S$ is too large, the spline will be too smooth and signal will be lost (underfit); if
S
$S$ is too small, the spline will pick up too much noise (overfit). In the extreme cases the function will return an interpolating spline
(θ = 0)$(\theta =0)$ if
S
$S$ is set to zero, and the weighted least squares cubic polynomial
(η = 0)$(\eta =0)$ if
S
$S$ is set very large. Experimenting with
S
$S$ values between these two extremes should result in a good compromise. (See
Section [Choice of
] for advice on choice of
S
$S$.)
The method employed is outlined in
Section [Outline of Method Used] and fully described in
Dierckx (1975),
Dierckx (1981) and
Dierckx (1982). It involves an adaptive strategy for locating the knots of the cubic spline (depending on the function underlying the data and on the value of
S
$S$), and an iterative method for solving the constrained minimization problem once the knots have been determined.
Values of the computed spline, or of its derivatives or definite integral, can subsequently be computed by calling
nag_fit_1dspline_eval (e02bb),
nag_fit_1dspline_deriv (e02bc) or
nag_fit_1dspline_integ (e02bd), as described in
Section [Evaluation of Computed Spline].
References
Dierckx P (1975) An algorithm for smoothing, differentiating and integration of experimental data using spline functions J. Comput. Appl. Math. 1 165–184
Dierckx P (1981) An improved algorithm for curve fitting with spline functions Report TW54 Department of Computer Science, Katholieke Univerciteit Leuven
Dierckx P (1982) A fast algorithm for smoothing data on a rectangular grid while using spline functions SIAM J. Numer. Anal. 19 1286–1304
Reinsch C H (1967) Smoothing by spline functions Numer. Math. 10 177–183
Parameters
Compulsory Input Parameters
 1:
start – string (length ≥ 1)
Must be set to 'C' or 'W'.
 start = 'C'${\mathbf{start}}=\text{'C'}$
 The function will build up the knot set starting with no interior knots. No values need be assigned to the parameters n, lamda, wrk or iwrk.
 start = 'W'${\mathbf{start}}=\text{'W'}$
 The function will restart the knotplacing strategy using the knots found in a previous call of the function. In this case, the parameters n, lamda, wrk, and iwrk must be unchanged from that previous call. This warm start can save much time in searching for a satisfactory value of s ${\mathbf{s}}$.
Constraint:
start = 'C'${\mathbf{start}}=\text{'C'}$ or
'W'$\text{'W'}$.
 2:
x(m) – double array
m, the dimension of the array, must satisfy the constraint
m ≥ 4${\mathbf{m}}\ge 4$.
The values
x_{r}${x}_{\mathit{r}}$ of the independent variable (abscissa) x $x$, for r = 1,2, … ,m$\mathit{r}=1,2,\dots ,m$.
Constraint:
x_{1} < x_{2} < ⋯ < x_{m}${x}_{1}<{x}_{2}<\cdots <{x}_{m}$.
 3:
y(m) – double array
m, the dimension of the array, must satisfy the constraint
m ≥ 4${\mathbf{m}}\ge 4$.
The values
y_{r}${y}_{\mathit{r}}$ of the dependent variable (ordinate) y $y$, for r = 1,2, … ,m$\mathit{r}=1,2,\dots ,m$.
 4:
w(m) – double array
m, the dimension of the array, must satisfy the constraint
m ≥ 4${\mathbf{m}}\ge 4$.
The values
w_{r}${w}_{\mathit{r}}$ of the weights, for
r = 1,2, … ,m$\mathit{r}=1,2,\dots ,m$. For advice on the choice of weights, see
Section [Weighting of data points] in the E02 Chapter Introduction.
Constraint:
w(r) > 0.0${\mathbf{w}}\left(\mathit{r}\right)>0.0$, for
r = 1,2, … ,m$\mathit{r}=1,2,\dots ,m$.
 5:
s – double scalar
The smoothing factor,
S $S$.
If S = 0.0$S=0.0$, the function returns an interpolating spline.
If
S$S$ is smaller than
machine precision, it is assumed equal to zero.
For advice on the choice of
S$S$, see
Sections [Description] and
[Choice of
].
Constraint:
s ≥ 0.0${\mathbf{s}}\ge 0.0$.
 6:
n – int64int32nag_int scalar
If the warm start option is used, the value of
n must be left unchanged from the previous call.
 7:
lamda(nest) – double array
nest, the dimension of the array, must satisfy the constraint
nest ≥ 8${\mathbf{nest}}\ge 8$. In most practical situations,
nest = m / 2${\mathbf{nest}}={\mathbf{m}}/2$ is sufficient.
nest never needs to be larger than
m + 4${\mathbf{m}}+4$, the number of knots needed for interpolation
(s = 0.0)$({\mathbf{s}}=0.0)$.
If the warm start option is used, the values
lamda(1),lamda(2), … ,lamda(n)${\mathbf{lamda}}\left(1\right),{\mathbf{lamda}}\left(2\right),\dots ,{\mathbf{lamda}}\left({\mathbf{n}}\right)$ must be left unchanged from the previous call.
 8:
wrk(lwrk) – double array
lwrk, the dimension of the array, must satisfy the constraint
lwrk ≥ 4 × m + 16 × nest + 41$\mathit{lwrk}\ge 4\times {\mathbf{m}}+16\times {\mathbf{nest}}+41$.
If the warm start option is used on entry, the values
wrk(1), … ,wrk(n)${\mathbf{wrk}}\left(1\right),\dots ,{\mathbf{wrk}}\left(n\right)$ must be left unchanged from the previous call.
 9:
iwrk(nest) – int64int32nag_int array
nest, the dimension of the array, must satisfy the constraint
nest ≥ 8${\mathbf{nest}}\ge 8$. In most practical situations,
nest = m / 2${\mathbf{nest}}={\mathbf{m}}/2$ is sufficient.
nest never needs to be larger than
m + 4${\mathbf{m}}+4$, the number of knots needed for interpolation
(s = 0.0)$({\mathbf{s}}=0.0)$.
If the warm start option is used, on entry, the values
iwrk(1), … ,iwrk(n)${\mathbf{iwrk}}\left(1\right),\dots ,{\mathbf{iwrk}}\left(n\right)$ must be left unchanged from the previous call.
This array is used as workspace.
Optional Input Parameters
 1:
m – int64int32nag_int scalar
Default:
The dimension of the arrays
x,
y,
w. (An error is raised if these dimensions are not equal.)
m $m$, the number of data points.
Constraint:
m ≥ 4${\mathbf{m}}\ge 4$.
 2:
nest – int64int32nag_int scalar
Default:
The dimension of the arrays
lamda,
iwrk. (An error is raised if these dimensions are not equal.)
An overestimate for the number, n$n$, of knots required.
Constraint:
nest ≥ 8${\mathbf{nest}}\ge 8$. In most practical situations,
nest = m / 2${\mathbf{nest}}={\mathbf{m}}/2$ is sufficient.
nest never needs to be larger than
m + 4${\mathbf{m}}+4$, the number of knots needed for interpolation
(s = 0.0)$({\mathbf{s}}=0.0)$.
Input Parameters Omitted from the MATLAB Interface
 lwrk
Output Parameters
 1:
n – int64int32nag_int scalar
The total number, n $n$, of knots of the computed spline.
 2:
lamda(nest) – double array
The knots of the spline, i.e., the positions of the interior knots
lamda(5), lamda(6), … ,lamda( n − 4) ${\mathbf{lamda}}\left(5\right),{\mathbf{lamda}}\left(6\right),\dots ,{\mathbf{lamda}}\left({\mathbf{n}}4\right)$ as well as the positions of the additional knots
and
needed for the Bspline representation.
 3:
c(nest) – double array
The coefficient
c_{i}${c}_{\mathit{i}}$ of the Bspline N_{i}(x)${N}_{\mathit{i}}\left(x\right)$ in the spline approximation s(x)$s\left(x\right)$, for i = 1,2, … ,n − 4$\mathit{i}=1,2,\dots ,n4$.
 4:
fp – double scalar
The sum of the squared weighted residuals,
θ$\theta $, of the computed spline approximation. If
fp = 0.0${\mathbf{fp}}=0.0$, this is an interpolating spline.
fp should equal
s ${\mathbf{s}}$ within a relative tolerance of
0.001$0.001$ unless
n = 8$n=8$ when the spline has no interior knots and so is simply a cubic polynomial. For knots to be inserted,
s ${\mathbf{s}}$ must be set to a value below the value of
fp produced in this case.
 5:
wrk(lwrk) – double array
 6:
iwrk(nest) – int64int32nag_int array
This array is used as workspace.
 7:
ifail – int64int32nag_int scalar
ifail = 0${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see
[Error Indicators and Warnings]).
Error Indicators and Warnings
Errors or warnings detected by the function:
 ifail = 1${\mathbf{ifail}}=1$
On entry,  start ≠ 'C'${\mathbf{start}}\ne \text{'C'}$ or 'W'$\text{'W'}$, 
or  m < 4${\mathbf{m}}<4$, 
or  s < 0.0${\mathbf{s}}<0.0$, 
or  s = 0.0${\mathbf{s}}=0.0$ and nest < m + 4${\mathbf{nest}}<{\mathbf{m}}+4$, 
or  nest < 8${\mathbf{nest}}<8$, 
or  lwrk < 4 × m + 16 × nest + 41$\mathit{lwrk}<4\times {\mathbf{m}}+16\times {\mathbf{nest}}+41$. 
 ifail = 2${\mathbf{ifail}}=2$

The weights are not all strictly positive.
 ifail = 3${\mathbf{ifail}}=3$
The values of
x(r)${\mathbf{x}}\left(\mathit{r}\right)$, for
r = 1,2, … ,m$\mathit{r}=1,2,\dots ,{\mathbf{m}}$, are not in strictly increasing order.
 ifail = 4${\mathbf{ifail}}=4$
The number of knots required is greater than
nest. Try increasing
nest and, if necessary, supplying larger arrays for the parameters
lamda,
c,
wrk and
iwrk. However, if
nest is already large, say
nest > m / 2${\mathbf{nest}}>{\mathbf{m}}/2$, then this error exit may indicate that
s is too small.
 ifail = 5${\mathbf{ifail}}=5$

The iterative process used to compute the coefficients of the approximating spline has failed to converge. This error exit may occur if
s has been set very small. If the error persists with increased
s, contact
NAG.
If
ifail = 4${\mathbf{ifail}}={\mathbf{4}}$ or
5${\mathbf{5}}$, a spline approximation is returned, but it fails to satisfy the fitting criterion (see
(2) and
(3)) – perhaps by only a small amount, however.
Accuracy
On successful exit, the approximation returned is such that its weighted sum of squared residuals
θ$\theta $ (as in
(3)) is equal to the smoothing factor
S
$S$, up to a specified relative tolerance of
0.001$0.001$ – except that if
n = 8$n=8$,
θ$\theta $ may be significantly less than
S
$S$: in this case the computed spline is simply a weighted least squares polynomial approximation of degree
3$3$, i.e., a spline with no interior knots.
Further Comments
Timing
The time taken for a call of
nag_fit_1dspline_auto (e02be) depends on the complexity of the shape of the data, the value of the smoothing factor
S
$S$, and the number of data points. If
nag_fit_1dspline_auto (e02be) is to be called for different values of
S
$S$, much time can be saved by setting
start = 'W'${\mathbf{start}}=\text{'W'}$ after the first call.
Choice of S
If the weights have been correctly chosen (see
Section [Weighting of data points] in the E02 Chapter Introduction), the standard deviation of
w_{r}y_{r}${w}_{r}{y}_{r}$ would be the same for all
r
$r$, equal to
σ$\sigma $, say. In this case, choosing the smoothing factor
S
$S$ in the range
σ^{2}(m ± sqrt(2m))${\sigma}^{2}(m\pm \sqrt{2m})$, as suggested by
Reinsch (1967), is likely to give a good start in the search for a satisfactory value. Otherwise, experimenting with different values of
S
$S$ will be required from the start, taking account of the remarks in
Section [Description].
In that case, in view of computation time and memory requirements, it is recommended to start with a very large value for
S
$S$ and so determine the least squares cubic polynomial; the value returned in
fp, call it
θ_{0}${\theta}_{0}$, gives an upper bound for
S
$S$. Then progressively decrease the value of
S
$S$ to obtain closer fits – say by a factor of
10$10$ in the beginning, i.e.,
S = θ_{0} / 10$S={\theta}_{0}/10$,
S = θ_{0} / 100$S={\theta}_{0}/100$, and so on, and more carefully as the approximation shows more details.
The number of knots of the spline returned, and their location, generally depend on the value of
S
$S$ and on the behaviour of the function underlying the data. However, if
nag_fit_1dspline_auto (e02be) is called with
start = 'W'${\mathbf{start}}=\text{'W'}$, the knots returned may also depend on the smoothing factors of the previous calls. Therefore if, after a number of trials with different values of
S
$S$ and
start = 'W'${\mathbf{start}}=\text{'W'}$, a fit can finally be accepted as satisfactory, it may be worthwhile to call
nag_fit_1dspline_auto (e02be) once more with the selected value for
S
$S$ but now using
start = 'C'${\mathbf{start}}=\text{'C'}$. Often,
nag_fit_1dspline_auto (e02be) then returns an approximation with the same quality of fit but with fewer knots, which is therefore better if data reduction is also important.
Outline of Method Used
If
S = 0$S=0$, the requisite number of knots is known in advance, i.e.,
n = m + 4$n=m+4$; the interior knots are located immediately as
λ_{i} = x_{i − 2}${\lambda}_{\mathit{i}}={x}_{\mathit{i}2}$, for
i = 5,6, … ,n − 4$\mathit{i}=5,6,\dots ,n4$. The corresponding least squares spline (see
nag_fit_1dspline_knots (e02ba)) is then an interpolating spline and therefore a solution of the problem.
If
S > 0$S>0$, a suitable knot set is built up in stages (starting with no interior knots in the case of a cold start but with the knot set found in a previous call if a warm start is chosen). At each stage, a spline is fitted to the data by least squares (see
nag_fit_1dspline_knots (e02ba)) and
θ$\theta $, the weighted sum of squares of residuals, is computed. If
θ > S
$\theta >S$, new knots are added to the knot set to reduce
θ$\theta $ at the next stage. The new knots are located in intervals where the fit is particularly poor, their number depending on the value of
S
$S$ and on the progress made so far in reducing
θ$\theta $. Sooner or later, we find that
θ ≤ S
$\theta \le S$ and at that point the knot set is accepted. The function then goes on to compute the (unique) spline which has this knot set and which satisfies the full fitting criterion specified by
(2) and
(3). The theoretical solution has
θ = S
$\theta =S$. The function computes the spline by an iterative scheme which is ended when
θ = S
$\theta =S$ within a relative tolerance of
0.001$0.001$. The main part of each iteration consists of a linear least squares computation of special form, done in a similarly stable and efficient manner as in
nag_fit_1dspline_knots (e02ba).
An exception occurs when the function finds at the start that, even with no interior knots (n = 8)$(n=8)$, the least squares spline already has its weighted sum of squares of residuals ≤ S
$\text{}\le S$. In this case, since this spline (which is simply a cubic polynomial) also has an optimal value for the smoothness measure η$\eta $, namely zero, it is returned at once as the (trivial) solution. It will usually mean that S
$S$ has been chosen too large.
For further details of the algorithm and its use, see
Dierckx (1981).
Evaluation of Computed Spline
The value of the computed spline at a given value
x may be obtained in the double variable
s by the call:
[s, ifail] = e02bb(lamda, c, x);
where
n,
lamda and
c are the output parameters of
nag_fit_1dspline_auto (e02be).
The values of the spline and its first three derivatives at a given value
x may be obtained in the double array
s of dimension at least
4$4$ by the call:
[s, ifail] = e02bc(lamda, c, x, left);
where if
left = 1${\mathbf{left}}=1$, lefthand derivatives are computed and if
left ≠ 1${\mathbf{left}}\ne 1$, righthand derivatives are calculated. The value of
left is only relevant if
x is an interior knot (see
nag_fit_1dspline_deriv (e02bc)).
The value of the definite integral of the spline over the interval
x(1)${\mathbf{x}}\left(1\right)$ to
x(m)${\mathbf{x}}\left({\mathbf{m}}\right)$ can be obtained in the double variable
dint by the call:
[dint, ifail] = e02bd(lamda, c);
(see
nag_fit_1dspline_integ (e02bd)).
Example
Open in the MATLAB editor:
nag_fit_1dspline_auto_example
function nag_fit_1dspline_auto_example
start = 'C';
x = [0;
0.5;
1;
1.5;
2;
2.5;
3;
4;
4.5;
5;
5.5;
6;
7;
7.5;
8];
y = [1.1;
0.372;
0.431;
1.69;
2.11;
3.1;
4.23;
4.35;
4.81;
4.61;
4.79;
5.23;
6.35;
7.19;
7.97];
w = [1;
2;
1.5;
1;
3;
1;
0.5;
1;
2;
2.5;
1;
3;
1;
2;
1];
s = 1;
n = int64(0);
lamda = zeros(54,1);
wrk = zeros(1105, 1);
iwrk = zeros(54, 1, 'int64');
[nOut, lamdaOut, c, fp, wrkOut, iwrkOut, ifail] = ...
nag_fit_1dspline_auto(start, x, y, w, s, n, lamda, wrk, iwrk);
nOut, lamdaOut, c, fp, ifail
nOut =
9
lamdaOut =
0
0
0
0
4
8
8
8
8
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
c =
1.3201
1.3542
5.5510
4.7031
8.2277
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
fp =
1.0003
ifail =
0
Open in the MATLAB editor:
e02be_example
function e02be_example
start = 'C';
x = [0;
0.5;
1;
1.5;
2;
2.5;
3;
4;
4.5;
5;
5.5;
6;
7;
7.5;
8];
y = [1.1;
0.372;
0.431;
1.69;
2.11;
3.1;
4.23;
4.35;
4.81;
4.61;
4.79;
5.23;
6.35;
7.19;
7.97];
w = [1;
2;
1.5;
1;
3;
1;
0.5;
1;
2;
2.5;
1;
3;
1;
2;
1];
s = 1;
n = int64(0);
lamda = zeros(54,1);
wrk = zeros(1105, 1);
iwrk = zeros(54, 1, 'int64');
[nOut, lamdaOut, c, fp, wrkOut, iwrkOut, ifail] = ...
e02be(start, x, y, w, s, n, lamda, wrk, iwrk);
nOut, lamdaOut, c, fp, ifail
nOut =
9
lamdaOut =
0
0
0
0
4
8
8
8
8
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
c =
1.3201
1.3542
5.5510
4.7031
8.2277
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
fp =
1.0003
ifail =
0
PDF version (NAG web site
, 64bit version, 64bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013