Program: SURFACEFIT
Purpose: Given values Z at positions (X,Y) and optional standard errors for Z
from a text file with data in columns, find best fit parameters for
a user given model function F(X,Y)
Category: ANALYSIS, FITTING
File: surfacefit.src
Author: M.G.R. Vogelaar
Keywords:
FILENAME= Enter name of file:
Name of text file with columns which represent values
Z at positions X, Y. A fourth column optionally represents
1 sigma errors of Z. These values are used as weights in the fit.
COLS= Enter columns for X, Y, Z, sigmaZ ... [1,2,3,4]:
First column should be X. Second is Y and the third is Z.
If a fourth column is omitted then all the weights are set
to 1 in the fit. Else, weights are calculated as
w_i = 1/(sigmaZ_i^2)
SCALE= Multiply Z and sigmaZ with factor .... [1.0]:
If your Z values are very big or very small, it is often
wise to scale your data to smaller values because it is
numerically more stable and less dependent on good initial
estimates (see INITVALS=).
The default is 1.0 which implies that the data is not scaled
at all.
FXY= Enter F(x,y) with parameters a0,..,a9 [a0+a1*x+a2*y]:
Specify the model with parameters a0, a1 etc. for which we
want a 'best fit'. The data is stored in x and y.
If the default is used, then the program uses an optimized
version of the function evaluations needed for the fit.
A more generic surface is:
FXY=a0+a1*x+a2*y+a3*x*y+a4*x*x+a5*y*y
INITVALS= Init. values for a0, a1,.. [0,0,...]:
Initial estimates of the parameters for which you want to
find a 'best' value. If the values of Z are near 1.0 then
the default values (all zero) often suffice.
Note that you need to enter exactly n values where n is the
number of best-fit parameters.
RESTABLE= Plot a table with the residuals ... [Y]/N:
For each position in the data, print the
position, the Z value, the Z value of the model
with the 'best fit' parateters and the difference
between these two Z values (the residual)
CHAUVENET= Remove Outliers with Chauvenet ... [Y]/N:
After fitting the model parameters, we have an
impression of what outliers are.
Filter possible outliers based on Chauvenet's criterion
Calculate mean and standard deviation of the residuals.
Express residual-mean in standard deviations of the
sample: d = |res_i-mean|/std
The probability to find this value in a range
mean-d*sigma and mean+d*sigma is P = erf(d/sqrt(2))
The probabilty to find a residual outside this
interval is Q=1-P
According to Chauvenet we can discard data points where
Q < N/2
The fit will be repeated with the outliers removed.
Note that Chauvenet can be applied only one.
TRIALS= Number of trials for bootstrap ... [0]:
If you need a realistic error on your fitted parameters
then apply a bootstrap method. The number of random
samples is given in this keyword. For each sub sample
a fit is made and the error is given by
the standard deviation of (an_boot - an_first)
where an_boot is parameter n for all the bootstrap
samples and an_first is parameter n for the first fit.
The default is TRIALS=0 which implies no bootstrapping.
Note that if the least squares fitting of your model
is sensitive to its initial estimates, these bootstrapping
errors can be unreasonable big. Usually this is a hint
to scale the data.
Note: For functions that are supplied by a user, the
function evaluation is done by Python's eval() function.
This implies that this evaluation is not optimal.
So if you want a reliable bootstrap error estimation
(e.g.: TRIALS=5000) then the procedure can take a while.
PLOT= Do you want a 3d plot ... [Y]:
Display a gridded (i.e. interpolated) version of your
data and the fitted curve. The plot is interactive, i.e.
one can rotate, pan and zoom the plot.
Description: Fit parameters of any model given by a user, given data from a
text file on disk. The parameters in the function that describes
the model are a0, a1, .. a9. If your data file has a column
with a standard error for each datapoint and you selected
that column (as the fourth column), then you get a meaningful
value for Chi square and the reduced Chi square and the
estimated errors. We list both the errors derived from the
covariance matrix (made by the fit routine) and the scaled errors,
which are derived from the covariance errors while assuming
a reduced Chi square equal to 1.0.
It seems that bootstrapping (TRIALS=) indicate that these scaled
errors are more realistic.
Example:
FILENAME= testplane.txt
COLS= 1 2 3
SCALE= 1e-13
FXY= a0+a1*x+a2*y+a3*x*x+a4*y*y+a5*x*y
INITVALS= 0::6
TRIALS= 200
PLOT=
Test Data:
! X Y Z
5.0 3996.0 -1.485E+17
397.0 3736.0 -1.328E+17
1995.0 3994.0 -2.147E+17
1994.0 7.0 -6.402E+16
10.0 59.0 -3.415E+13
1177.0 16.0 -2.231E+16
600.0 1610.0 -3.017E+16
1420.0 2070.0 -7.314E+16
1141.0 1160.0 -3.385E+16
990.0 3085.0 -1.052E+17
574.0 2389.0 -5.877E+16
1740.0 1586.0 -7.291E+16
1456.0 3278.0 -1.354E+17
864.0 3846.0 -1.505E+17
1604.0 3030.0 -1.282E+17
1332.0 3658.0 -1.544E+17
554.0 3198.0 -1.006E+17
1643.0 900.0 -5.141E+16
442.0 1937.0 -3.828E+16
1266.0 1661.0 -5.205E+16
Results:
Program scaled data as Z = Z*1e-13
Results for F(x,y)=a0+a1*x+a2*y+a3*x*x+a4*y*y+a5*x*y
====================================================
par value cov.error std error
a0 0.278127 0.944506 1.93969
a1 -0.000673375 0.00167908 0.00344825
a2 -0.000103869 0.000800445 0.00164384
a3 -0.00160955 7.79177e-07 1.60017e-06
a4 -0.000930062 1.74786e-07 3.58951e-07
a5 -2.83247e-05 2.3408e-07 4.80722e-07
Chi^2 min: 59.0453
Reduced Chi^2: 4.21752
Iterations: 12
Function ev: 95
Status: 2
Residuals from Z - Zmodel:
Residuals max: 3.65061
Residuals min: 0.0327222
Residuals standard deviation: 1.175
Standard errors with bootstrap:
sigma a0: 5.52513
sigma a1: 0.00792682
sigma a2: 0.00247251
sigma a3: 3.08476e-06
sigma a4: 4.59987e-07
sigma a5: 1.10049e-06
The value Chi^2 min is a measure of goodness of fit.
Note that it scales both with the values of Z (high values
of Z result in high values of the residuals) and with
the weights.
The reduced Chi-square gives a relation between the covariance
errors and the standard deviation of the mother sample.
If this value is ~1 then the errors from the covariance
matrix can be used. If not, or when you have equal weights,
then you should use the standard errors derived from the
assumption that the reduced Chi-square is 1.
If the weights are not under- or over estimated, then the
bootstrap errors will look like the standard errors.
Notes: This is a GIPSY task written in Python. It uses kmpfit,
the least squares routine in the Kapteyn Package.
Python enables us to evaluate model expressions given
by a user. The fit routine does not need expressions for
the parameter derivatives, so we end up with a flexible
program that enables a user to explore different models.
Updates: Nov 20, 2011: VOG, Document created.