stats {<ranges>} 'filename' {matrix | using N{:M}} {name 'prefix'} {{no}output}
This command prepares a statistical summary of the data in one or two columns of a file. The using specifier is interpreted in the same way as for plot commands. See plot (p. ) for details on the index (p. ), every (p. ), and using (p. ) directives. Data points are filtered against both xrange and yrange before analysis. See set xrange (p. ). The summary is printed to the screen by default. Output can be redirected to a file by prior use of the command set print, or suppressed altogether using the nooutput option.
In addition to printed output, the program stores the individual statistics into three sets of variables. The first set of variables reports how the data is laid out in the file: The array of column headers is generated only if option set datafile columnheaders is in effect.
STATS_records |
18#18 | total number 15#15 of in-range data records |
STATS_outofrange |
19#19 | number of records filtered out by range limits |
STATS_invalid |
19#19 | number of invalid/incomplete/missing records |
STATS_blank |
19#19 | number of blank lines in the file |
STATS_blocks |
19#19 | number of indexable blocks of data in the file |
STATS_columns |
19#19 | number of data columns in the first row of data |
STATS_column_header |
19#19 | array of strings holding column headers found |
The second set reports properties of the in-range data from a single column. This column is treated as y. If the y axis is autoscaled then no range limits are applied. Otherwise only values in the range [ymin:ymax] are considered.
If two columns are analysed jointly by a single stats command, the suffix "_x" or "_y" is appended to each variable name. I.e. STATS_min_x is the minimum value found in the first column, while STATS_min_y is the minimum value found in the second column. In this case points are filtered by testing against both xrange and yrange.
STATS_min |
20#20 | minimum value of in-range data points | |
STATS_max |
21#21 | maximum value of in-range data points | |
STATS_index_min |
22#22 | index i for which data[i] == STATS_min | |
STATS_index_max |
23#23 | index i for which data[i] == STATS_max | |
STATS_mean |
24#24 | 25#25 | mean value of the in-range data points |
STATS_stddev |
26#26 | 27#27 | population standard deviation of the in-range data |
STATS_ssd |
28#28 | 29#29 | sample standard deviation of the in-range data |
STATS_lo_quartile |
value of the lower (1st) quartile boundary | ||
STATS_median |
median value | ||
STATS_up_quartile |
value of the upper (3rd) quartile boundary | ||
STATS_sum |
30#30 | sum | |
STATS_sumsq |
31#31 | sum of squares | |
STATS_skewness |
32#32 | skewness of the in-range data points | |
STATS_kurtosis |
33#33 | kurtosis of the in-range data points | |
STATS_adev |
34#34 | mean absolute deviation of the in-range data | |
STATS_mean_err |
35#35 | standard error of the mean value | |
STATS_stddev_err |
36#36 | standard error of the standard deviation | |
STATS_skewness_err |
37#37 | standard error of the skewness | |
STATS_kurtosis_err |
38#38 | standard error of the kurtosis |
The third set of variables is only relevant to analysis of two data columns.
STATS_correlation |
sample correlation coefficient between x and y values | |
STATS_slope |
A corresponding to a linear fit y = Ax + B | |
STATS_slope_err |
uncertainty of A | |
STATS_intercept |
B corresponding to a linear fit y = Ax + B | |
STATS_intercept_err |
uncertainty of B | |
STATS_sumxy |
sum of x*y | |
STATS_pos_min_y |
x coordinate of a point with minimum y value | |
STATS_pos_max_y |
x coordinate of a point with maximum y value |
Keyword matrix indicates that the input consists of a matrix (see matrix (p. )); the usual statistics are generated by considering all matrix elements. The matrix dimensions are saved in variables STATS_size_x and STATS_size_y.
STATS_size_x |
number of matrix columns | |
STATS_size_y |
number of matrix rows |
The index reported in STATS_index_xxx corresponds to the value of pseudo-column 0 ($0) in plot commands. I.e. the first point has index 0, the last point has index N-1.
Data values are sorted to find the median and quartile boundaries. If the total number of points N is odd, then the median value is taken as the value of data point (N+1)/2. If N is even, then the median is reported as the mean value of points N/2 and (N+2)/2. Equivalent treatment is used for the quartile boundaries.
For an example of using the stats command to annotate a subsequent plot, see http://www.gnuplot.info/demo/stats.htmlstats.dem.
The stats command in this version of gnuplot can handle log-scaled data, but
not the content of time/date fields (set xdata time or set ydata time).
This restriction may be relaxed in a future version.