Fit

The fit command fits a user-supplied real-valued expression to a set of data points, using the nonlinear least-squares Marquardt-Levenberg algorithm. There can be up to 12 independent variables, there is always 1 dependent variable, and any number of parameters can be fitted. Optionally, error estimates can be input for weighting the data points.

The basic use of fit is best explained by a simple example:

     f(x) = a + b*x + c*x**2
     fit f(x) 'measured.dat' using 1:2 via a,b,c
     plot 'measured.dat' u 1:2, f(x)

Syntax:

     fit {<ranges>} <expression>
         '<datafile>' {datafile-modifiers}
         {{unitweights} | {y|xy|z}error | errors <var1>{,<var2>,...}}
         via '<parameter file>' | <var1>{,<var2>,...}

Ranges may be specified to filter the data used in fitting. Out-of-range data points are ignored. The syntax is

     [{dummy_variable=}{<min>}{:<max>}],
analogous to plot; see plot ranges (p. [*]).

4#4expression5#5 can be any valid gnuplot expression, although the most common is a previously user-defined function of the form f(x) or f(x,y). It must be real-valued. The names of the independent variables are set by the set dummy command, or in the 4#4ranges5#5 part of the command (see below); by default, the first two are called x and y. Furthermore, the expression should depend on one or more variables whose value is to be determined by the fitting procedure.

4#4datafile5#5 is treated as in the plot command. All the plot datafile modifiers (using, every,...) except smooth are applicable to fit. See plot datafile (p. [*]).

The datafile contents can be interpreted flexibly by providing a using qualifier as with plot commands. For example to generate the independent variable x as the sum of columns 2 and 3, while taking z from column 6 and requesting equal weights:

     fit ... using ($2+$3):6

In the absence of a using specification, the fit implicitly assumes there is only a single independent variable. If the file itself, or the using specification, contains only a single column of data, the line number is taken as the independent variable. If a using specification is given, there can be up to 12 independent variables (and more if specially configured at compile time).

The unitweights option, which is the default, causes all data points to be weighted equally. This can be changed by using the errors keyword to read error estimates of one or more of the variables from the data file. These error estimates are interpreted as the standard deviation s of the corresponding variable value and used to compute a weight for the datum as 1/s**2.

In case of error estimates of the independent variables, these weights are further multiplied by fitting function derivatives according to the "effective variance method" (Jay Orear, Am. J. Phys., Vol. 50, 1982).

The errors keyword is to be followed by a comma-separated list of one or more variable names for which errors are to be input; the dependent variable z must always be among them, while independent variables are optional. For each variable in this list, an additional column will be read from the file, containing that variable's error estimate. Again, flexible interpretation is possible by providing the using qualifier. Note that the number of independent variables is thus implicitly given by the total number of columns in the using qualifier, minus 1 (for the dependent variable), minus the number of variables in the errors qualifier.

As an example, if one has 2 independent variables, and errors for the first independent variable and the dependent variable, one uses the errors x,z qualifier, and a using qualifier with 5 columns, which are interpreted as x:y:z:sx:sz (where x and y are the independent variables, z the dependent variable, and sx and sz the standard deviations of x and z).

A few shorthands for the errors qualifier are available: yerrors (for fits with 1 column of independent variable), and zerrors (for the general case) are all equivalent to errors z, indicating that there is a single extra column with errors of the dependent variable.

xyerrors, for the case of 1 independent variable, indicates that there are two extra columns, with errors of both the independent and the dependent variable. In this case the errors on x and y are treated by Orear's effective variance method.

Note that yerror and xyerror are similar in both form and interpretation to the yerrorlines and xyerrorlines 2D plot styles.

With the command set fit v4 the fit command syntax is compatible with gnuplot version 4. In this case there must be two more using qualifiers (z and s) than there are independent variables, unless there is only one variable. gnuplot then uses the following formats, depending on the number of columns given in the using specification:

     z                           # 1 independent variable (line number)
     x:z                         # 1 independent variable (1st column)
     x:z:s                       # 1 independent variable (3 columns total)
     x:y:z:s                     # 2 independent variables (4 columns total)
     x1:x2:x3:z:s                # 3 independent variables (5 columns total)
     x1:x2:x3:...:xN:z:s         # N independent variables (N+2 columns total)

Please beware that this means that you have to supply z-errors s in a fit with two or more independent variables. If you want unit weights you need to supply them explicitly by using e.g. then format x:y:z:(1).

The dummy variable names may be changed when specifying a range as noted above. The first range corresponds to the first using spec, and so on. A range may also be given for z (the dependent variable), in which case data points for which f(x,...) is out of the z range will not contribute to the residual being minimized.

Multiple datasets may be simultaneously fit with functions of one independent variable by making y a 'pseudo-variable', e.g., the dataline number, and fitting as two independent variables. See fit multi-branch (p. [*]).

The via qualifier specifies which parameters are to be optimized, either directly, or by referencing a parameter file.

Examples:

     f(x) = a*x**2 + b*x + c
     g(x,y) = a*x**2 + b*y**2 + c*x*y
     set fit limit 1e-6
     fit f(x) 'measured.dat' via 'start.par'
     fit f(x) 'measured.dat' using 3:($7-5) via 'start.par'
     fit f(x) './data/trash.dat' using 1:2:3 yerror via a, b, c
     fit g(x,y) 'surface.dat' using 1:2:3 via a, b, c
     fit a0 + a1*x/(1 + a2*x/(1 + a3*x)) 'measured.dat' via a0,a1,a2,a3
     fit a*x + b*y 'surface.dat' using 1:2:3 via a,b
     fit [*:*][yaks=*:*] a*x+b*yaks 'surface.dat' u 1:2:3 via a,b

     fit [][][t=*:*] a*x + b*y + c*t 'foo.dat' using 1:2:3:4 via a,b,c

     set dummy x1, x2, x3, x4, x5
     h(x1,x2,x3,x4,s5) = a*x1 + b*x2 + c*x3 + d*x4 + e*x5
     fit h(x1,x2,x3,x4,x5) 'foo.dat' using 1:2:3:4:5:6 via a,b,c,d,e

After each iteration step, detailed information about the current state of the fit is written to the display. The same information about the initial and final states is written to a log file, "fit.log". This file is always appended to, so as to not lose any previous fit history; it should be deleted or renamed as desired. By using the command set fit logfile, the name of the log file can be changed.

If activated by using set fit errorvariables, the error for each fitted parameter will be stored in a variable named like the parameter, but with "_err" appended. Thus the errors can be used as input for further computations.

If set fit prescale is activated, fit parameters are prescaled by their initial values. This helps the Marquardt-Levenberg routine converge more quickly and reliably in cases where parameters differ in size by several orders of magnitude.

The fit may be interrupted by pressing Ctrl-C (Ctrl-Break in wgnuplot). After the current iteration completes, you have the option to (1) stop the fit and accept the current parameter values, (2) continue the fit, (3) execute a gnuplot command as specified by set fit script or the environment variable FIT_SCRIPT. The default is replot, so if you had previously plotted both the data and the fitting function in one graph, you can display the current state of the fit.

Once fit has finished, the save fit command may be used to store final values in a file for subsequent use as a parameter file. See save fit (p. [*]) for details.


Subsections