Program:      SURFACEFIT

Purpose:      Given values Z at positions (X,Y) and optional standard errors for Z
              from a text file with data in columns, find best fit parameters for 
              a user given model function F(X,Y)
              
Category:     ANALYSIS, FITTING

File:         surfacefit.src

Author:       M.G.R. Vogelaar


Keywords:

 FILENAME=    Enter name of file: 

              Name of text file with columns which represent values
              Z at positions X, Y. A fourth column optionally represents
              1 sigma errors of Z. These values are used as weights in the fit.


 COLS=        Enter columns for X, Y, Z, sigmaZ ... [1,2,3,4]:

              First column should be X. Second is Y and the third is Z.
              If a fourth column is omitted then all the weights are set
              to 1 in the fit. Else, weights are calculated as 
              w_i = 1/(sigmaZ_i^2)


 SCALE=       Multiply Z and sigmaZ with factor .... [1.0]:

              If your Z values are very big or very small, it is often 
              wise to scale your data to smaller values because it is
              numerically more stable and less dependent on good initial
              estimates (see INITVALS=).
              The default is 1.0 which implies that the data is not scaled
              at all.


 FXY=         Enter F(x,y) with parameters a0,..,a9    [a0+a1*x+a2*y]:

              Specify the model with parameters a0, a1 etc. for which we
              want a 'best fit'. The data is stored in x and y.
              If the default is used, then the program uses an optimized
              version of the function evaluations needed for the fit.
              A more generic surface is:
              FXY=a0+a1*x+a2*y+a3*x*y+a4*x*x+a5*y*y
              

 INITVALS=    Init. values for a0, a1,.. [0,0,...]:
      
              Initial estimates of the parameters for which you want to 
              find a 'best' value. If the values of Z are near 1.0 then
              the default values (all zero) often suffice.
              Note that you need to enter exactly n values where n is the
              number of best-fit parameters.


 RESTABLE=    Plot a table with the residuals ... [Y]/N:

              For each position in the data, print the
              position, the Z value, the Z value of the model
              with the 'best fit' parateters and the difference
              between these two Z values (the residual)


 CHAUVENET=   Remove Outliers with Chauvenet ... [Y]/N:

              After fitting the model parameters, we have an
              impression of what outliers are.
              Filter possible outliers based on Chauvenet's criterion
              Calculate mean and standard deviation of the residuals.
              Express residual-mean in standard deviations of the
              sample: d = |res_i-mean|/std
              The probability to find this value in a range
              mean-d*sigma and mean+d*sigma is P = erf(d/sqrt(2))
              The probabilty to find a residual outside this
              interval is Q=1-P
              According to Chauvenet we can discard data points where
              Q < N/2
              The fit will be repeated with the outliers removed.
              Note that Chauvenet can be applied only one.


 TRIALS=      Number of trials for bootstrap ... [0]:

              If you need a realistic error on your fitted parameters
              then apply a bootstrap method. The number of random 
              samples is given in this keyword. For each sub sample 
              a fit is made and the error is given by
              the standard deviation of (an_boot - an_first)
              where an_boot is parameter n for all the bootstrap
              samples and an_first is parameter n for the first fit.

              The default is TRIALS=0 which implies no bootstrapping.
              Note that if the least squares fitting of your model
              is sensitive to its initial estimates, these bootstrapping
              errors can be unreasonable big. Usually this is a hint
              to scale the data.

              Note: For functions that are supplied by a user, the 
              function evaluation is done by Python's eval() function.
              This implies that this evaluation is not optimal.
              So if you want a reliable bootstrap error estimation
              (e.g.: TRIALS=5000) then the procedure can take a while.


 PLOT=        Do you want a 3d plot ... [Y]:

              Display a gridded (i.e. interpolated) version of your
              data and the fitted curve. The plot is interactive, i.e.
              one can rotate, pan and zoom the plot.


Description:  Fit parameters of any model given by a user, given data from a 
              text file on disk. The parameters in the function that describes
              the model are a0, a1, .. a9. If your data file has a column
              with a standard error for each datapoint and you selected
              that column (as the fourth column), then you get a meaningful
              value for Chi square and the reduced Chi square and the
              estimated errors. We list both the errors derived from the
              covariance matrix (made by the fit routine) and the scaled errors,
              which are derived from the covariance errors while assuming
              a reduced Chi square equal to 1.0.
              It seems that bootstrapping (TRIALS=) indicate that these scaled
              errors are more realistic.

Example:      
              FILENAME= testplane.txt
              COLS= 1 2 3
              SCALE= 1e-13
              FXY= a0+a1*x+a2*y+a3*x*x+a4*y*y+a5*x*y
              INITVALS= 0::6
              TRIALS= 200      
              PLOT=

              Test Data:

!       X                         Y                        Z
        5.0                     3996.0                  -1.485E+17
        397.0                   3736.0                  -1.328E+17
        1995.0                  3994.0                  -2.147E+17
        1994.0                  7.0                     -6.402E+16
        10.0                    59.0                    -3.415E+13
        1177.0                  16.0                    -2.231E+16
        600.0                   1610.0                  -3.017E+16
        1420.0                  2070.0                  -7.314E+16
        1141.0                  1160.0                  -3.385E+16
        990.0                   3085.0                  -1.052E+17
        574.0                   2389.0                  -5.877E+16
        1740.0                  1586.0                  -7.291E+16
        1456.0                  3278.0                  -1.354E+17
        864.0                   3846.0                  -1.505E+17
        1604.0                  3030.0                  -1.282E+17
        1332.0                  3658.0                  -1.544E+17
        554.0                   3198.0                  -1.006E+17
        1643.0                  900.0                   -5.141E+16
        442.0                   1937.0                  -3.828E+16
        1266.0                  1661.0                  -5.205E+16


              Results:

Program scaled data as Z = Z*1e-13
Results for F(x,y)=a0+a1*x+a2*y+a3*x*x+a4*y*y+a5*x*y
====================================================
  par           value       cov.error       std error
   a0        0.278127        0.944506         1.93969
   a1    -0.000673375      0.00167908      0.00344825
   a2    -0.000103869     0.000800445      0.00164384
   a3     -0.00160955     7.79177e-07     1.60017e-06
   a4    -0.000930062     1.74786e-07     3.58951e-07
   a5    -2.83247e-05      2.3408e-07     4.80722e-07

Chi^2 min:      59.0453
Reduced Chi^2:  4.21752
Iterations:     12
Function ev:    95
Status:         2

Residuals from Z - Zmodel:
Residuals max:                 3.65061
Residuals min:                 0.0327222
Residuals standard deviation:  1.175

Standard errors with bootstrap:
sigma a0: 5.52513
sigma a1: 0.00792682
sigma a2: 0.00247251
sigma a3: 3.08476e-06
sigma a4: 4.59987e-07
sigma a5: 1.10049e-06

              The value Chi^2 min is a measure of goodness of fit.
              Note that it scales both with the values of Z (high values
              of Z result in high values of the residuals) and with
              the weights.
              The reduced Chi-square gives a relation between the covariance
              errors and the standard deviation of the mother sample.
              If this value is ~1 then the errors from the covariance 
              matrix can be used. If not, or when you have equal weights,
              then you should use the standard errors derived from the
              assumption that the reduced Chi-square is 1.
              
              If the weights are not under- or over estimated, then the
              bootstrap errors will look like the standard errors.
             
Notes:        This is a GIPSY task written in Python. It uses kmpfit,
              the least squares routine in the Kapteyn Package.
              Python enables us to evaluate model expressions given 
              by a user. The fit routine does not need expressions for
              the parameter derivatives, so we end up with a flexible 
              program that enables a user to explore different models.


Updates:      Nov 20,  2011: VOG, Document created.