Table of Contents
tabhist - histogram plotter and gaussian fit for tabular data
tabhist plots a histogram and computes
the first 4 moments of points obtained from an (set of) column(s) from
an ascii table(5NEMO)
. It calculates the mean, dispersion, skewness and
kurtosis. It then plots the data in the form of a histogram, optionally
with a ’best’ gaussian fit and residuals. It can also plot a cumulative histogram
of the data (see also tabtrend(1NEMO)
The median is also optionally computed,
for which an alternate sorting routine can be selected (if available).
If multiple columns are used, they are appended to each other and treated
like a single long column. If you need simple statistics for multiple columns,
use tabstat instead.
Using the nsigma keyword, outliers can be removed
and statistics re-computed.
The following parameters are recognized
in any order if the keyword is also given:
Points that fall exactly on the boundary of a bin, are added to the
right side. Another way of saying is that a cell includes the left side,
but is open ended on the right side, i.e. [---). Note this behavior is exactly
opposite that of SM’s behavior, which was used to compare results.
- input file, in tabular
format [no default].
- column(s) from which the (X) values are
taken. Multiple columns can be given in the usual nemoinp(3NEMO)
for integers. [default 1].
- Minimum value along X-axis to include
[default: autoscaling on minimum of datapoints].
- Maximum value
along X-axis to include [default: autoscaling on maximum of datapoints].
- If one number is given, it is the number of (equal size) bins
between min and max. If more than one number is given, they are the bin
edges (thus one more than the number of bins need to be supplied). One rule
of thumb is that the number of bins be the square root of the number of
values. [default: 16].
- Maximum count value per bin plotted
along the Y-axis [default: autoscaling].
- Maximum number of
lines allowed to read from datafile, if the data comes from a pipe as supposed
from file. A regular file will use the number of lines in the file as default.
- Take log of Y-axis? This option is really redundant,
can handle any transformation [default: f].
- Label along the X-axis [default: value].
- Label along the Y-axis
[default: N or log(N)].
- Random verbiage, will be plotted along
right top of plot for identification [default: none]. The left top will
contain the filename infile.
- Table output? If false, a plot using
your current yapp(5NEMO)
device will be given. If true, a simple ascii version
of a histogram is shown. The default choice of bins=16 keeps all the information
on a simple 80*24 screen. [Default: f]
- If true, plot output will
contain a gaussian fit. The "best" gaussian fit is the one that has the
same mean, dispersion and integrated area as the data. [Default: t].
- If true, the residual (data - fit) will be plotted as a dashed line. [Default:
- Plot a cumulative histogram instead. If set, the keywords gauss,
residual, and ylog are automatically re-set to false. [Default: f].
- Normalize the cumulative histogram to 1. By default the maxcount is used.
- Compute median too? This can be timeconsuming for
large numbers of points. [Default: t].
- Fast median calculator in
case N is large. No sort will be used, but instead an iterative O(N) method
will be used. [Default: f]. robust=t|f Use a robust estimator to remove outliers
before taking statistics again. [Default: f].
- Also compute the Mean
Absolute Deviation (MAD). [Default: f].
- Dual pass over the data,
subtracting the mean. This can be important if the data have a large average
value with a small spread around the mean. Notice that in this mode the
mean is subtracted but never added to the min/max/mean/median etc. Default:
- Delete points more than nsigma times the sigma from the mean.
After each point sigma and mean are recomputed. [Default: none]
- If given, it will be the filename where a template python script
that can serve as starting point for more elaborate plotting. Default: none.
http://arxiv.org/ps/0807.4820 (choosing the binning for a histogram)
xx-mar-88 V1.0: created PJT
15-Apr-88 V1.1: higher order moments, Y scale PJT
1-jun-88 V2.0: new name, code same PJT
28-oct-88 V2.0a: updated doc + labels plotting done PJT
13-nov-93 V2.7: added gaussian model + residuals PJT
11-jul-96 V2.8: log scale is now 10-based, not e PJT
12-apr-97 V3.0: added cumulative option PJT
24-apr-98 V3.0a: fix median calculation for restricted range PJT
22-dec-99 V3.1a: optional median, fix N=1 reporting bug PJT
24-jan-00 documentation updated with program PJT
7-jun-01 3.2: added nsigma, corrected man page options PJT
7-may-03 4.0: multiple columns allowed PJT
28-jan-05 5.0: separate xmin/xmax=, added sort=, fix median if nsigma PJT
1-jun-10 6.0: bins= now allowed to have manual edges PJT
22-aug-12 6.2: added torben= option for fast large-N median PJT
16-jan-14 6.4: added mad= PJT
8-jan-2020 7.0: added pyplot= PJT
2-mar-2020 7.1: added norm= PJT
Table of Contents