# How to Choose Knots

This example shows how to select and optimize knots using the `optknt`

and `newknt`

commands from Curve Fitting Toolbox™.

### Sample Data

Here are some sample data, much used for testing spline approximation with variable knots, the so-called Titanium Heat Data. They record some property of titanium measured as a function of temperature.

[xx,yy] = titanium; plot(xx,yy,'x'); axis([500 1100 .55 2.25]); title('The Titanium Heat Data'); hold on

Notice the rather sharp peak. We'll use these data to illustrate some methods for knot selection.

First, we pick a few data points from these somewhat rough data. We will interpolate using this subset, then compare results to the full dataset.

pick = [1 5 11 21 27 29 31 33 35 40 45 49]; tau = xx(pick); y = yy(pick); plot(tau,y,'ro'); legend({'Full Dataset' 'Subsampled Data'}, 'location','NW');

### General Considerations

A spline of order `k`

with `n+k`

knots has `n`

degrees of freedom. Since we have 12 data sites, `tau(1) < ... < tau(12)`

, a fit with a cubic spline, i.e., a fourth order spline, requires a knot sequence `t`

of length 12+4.

Moreover, the knot sequence `t`

must satisfy the *Schoenberg-Whitney conditions*, i.e., must be such that the i-th data site lies in the support of the i-th B-spline, i.e.,

` t(i) < tau(i) < t(i+k) for all i,`

with equality allowed only in case of a knot of multiplicity `k`

.

One way to choose a knot sequence satisfying all these conditions is as the optimal knots, of Gaffney/Powell and Micchelli/Rivlin/Winograd.

### Optimal Knots

In *optimal* spline interpolation, to values at sites

` tau(1), ..., tau(n)`

say, the knots are chosen so as to minimize the constant in a standard error formula. Specifically, the first and the last data site are chosen as k-fold knots. The remaining `n-k`

knots are supplied by `optknt`

.

Here is the beginning of the help from `optknt`

:

OPTKNT Optimal knot distribution.

`OPTKNT(TAU,K) returns an `optimal' knot sequence for`

`interpolation at data sites TAU(1), ..., TAU(n) by splines of`

`order K. TAU must be an increasing sequence, but this is not`

`checked.`

`OPTKNT(TAU,K,MAXITER) specifies the number MAXITER of iterations`

`to be tried, the default being 10.`

`The interior knots of this knot sequence are the n-K`

`sign-changes in any absolutely constant function h ~= 0 that`

`satisfies`

` integral{ f(x)h(x) : TAU(1) < x < TAU(n) } = 0`

`for all splines f of order K with knot sequence TAU.`

### Trying OPTKNT

We try using `optknt`

for interpolation on our example, interpolating by cubic splines to data

` (tau(i), y(i)), for i = 1, ..., n.`

k = 4; osp = spapi( optknt(tau,k), tau,y); fnplt(osp,'r'); hl = legend({'Full Dataset' 'Subsampled Data' ... 'Cubic Spline Interpolant Using Optimal knots'}, ... 'location','NW'); hl.Position = hl.Position-[.14,0,0,0];

This is a bit disconcerting!

Here, marked by stars, are also the interior optimal knots:

xi = fnbrk(osp,'knots'); xi([1:k end+1-(1:k)]) = []; plot(xi,repmat(1.4, size(xi)),'*'); hl = legend({'Full Dataset' 'Subsampled Data' ... 'Cubic Spline Interpolant Using Optimal knots' ... 'Optimal Knots'}, 'location','NW'); hl.Position = hl.Position-[.14,0,0,0];

### What Happened?

The knot choice for optimal interpolation is designed to make the maximum over *all* functions `f`

of the ratio

` norm(f - If) / norm(D^k f)`

as small as possible, where the numerator is the norm of the interpolation error, `f - If`

, and the denominator is the norm of the `k`

-th derivative of the interpolant, `D^k f`

. Since our data imply that `D^k f`

is rather large, the interpolation error near the flat part of the data is of acceptable size for such an `optimal' scheme.

Actually, for these data, the ordinary cubic spline interpolant provided by `csapi`

does quite well:

cs = csapi(tau,y); fnplt(cs,'g',2); hl = legend({'Full Dataset' 'Subsampled Data' ... 'Cubic Spline Interpolant Using Optimal knots' ... 'Optimal Knots' 'Cubic Spline Interpolant Using CSAPI'}, ... 'location','NW'); hl.Position = hl.Position-[.14,0,0,0]; hold off

### Knot Choice for Least Squares Approximation

Knots must be selected when doing least-squares approximation by splines. One approach is to use equally-spaced knots to begin with, then use `newknt`

with the approximation obtained for a better knot distribution.

The next sections illustrate these steps with the full titanium heat data set.

### Least Squares Approximation with Uniform Knot Sequence

We start with a uniform knot sequence.

unif = linspace(xx(1), xx(end), 2+fix(length(xx)/4)); sp = spap2(augknt(unif, k), k, xx, yy); plot(xx,yy,'x'); hold on fnplt(sp,'r'); axis([500 1100 .55 2.25]); title('The Titanium Heat Data'); hl = legend({'Full Dataset' ... 'Least Squares Cubic Spline Using Uniform Knots'}, ... 'location','NW'); hl.Position = hl.Position-[.14,0,0,0];

This is not at all satisfactory. So we use `newknt`

for a spline approximation of the same order and with the same number of polynomial pieces, but the breaks better distributed.

### Using NEWKNT to Improve the Knot Distribution

spgood = spap2(newknt(sp), k, xx,yy); fnplt(spgood,'g',1.5); hl = legend({'Full Dataset' ... 'Least Squares Cubic Spline Using Uniform Knots' ... 'Least Squares Cubic Spline Using NEWKNT'}, ... 'location','NW'); hl.Position = hl.Position-[.14,0,0,0]; hold off

This is quite good. Incidentally, even one interior knot fewer would not have sufficed in this case.