This HTML automatically generated with rman for NEMO
Table of Contents
tabhist - histogram plotter and gaussian fit for tabular data
tabhist
in=infile [parameter=value]
tabhist plots a histogram and computes
the first 4 moments of points obtained from an (set of) column(s) from
an ascii table(5NEMO)
. It calculates the mean, dispersion, skewness and
kurtosis. It then plots the data in the form of a histogram, optionally
with a ’best’ gaussian fit and residuals. It can also plot a cumulative histogram
of the data (see also tabtrend(1NEMO)
and tabint(1NEMO)
).
The Median is
also optionally computed, for which an alternate sorting routine can be
selected (if available).
- The TriMean is reported, which is a robust statistic
for the central
- tendency, the weighted average of Median and two quartiles:
(Q1 + 2*Q2 + Q3) / 4.
If multiple columns are used, they are appended to
each other and treated like a single long column. If you need simple statistics
for multiple columns, see tabstat(1NEMO)
instead.
Using the nsigma keyword,
outliers can be removed and statistics re-computed.
The following
parameters are recognized in order; they may be given in any order if the
keyword is also given. Use --help to confirm this man page is up to date.
- in=infile
- input file, in tabular format [no default].
- xcol=column
- column(s)
from which the (X) values are taken. Multiple columns can be given in the
usual nemoinp(3NEMO)
notation for integers. [default 1].
- xmin=x_min
- Minimum
value along X-axis to include [default: autoscaling on minimum of datapoints].
- xmax=x_max
- Maximum value along X-axis to include [default: autoscaling
on maximum of datapoints].
- bins=n_bins
- If one number is given, it is the
number of (equal size) bins between min and max. If more than one number
is given, they are the bin edges (thus one more than the number of bins
need to be supplied). One rule of thumb is that the number of bins be the
square root of the number of values. [default: 16].
- maxcount=count
- Maximum
count value per bin plotted along the Y-axis [default: autoscaling].
- nmax=max_lines
- Maximum number of lines allowed to read from datafile, if the data comes
from a pipe as supposed from file. A regular file will use the number of
lines in the file as default. [Default: 0].
- ylog=t|f
- Take log of Y-axis? This
option is really redundant, since tabmath(1NEMO)
can handle any transformation
[default: f].
- xlab=x-label
- Label along the X-axis [default: value].
- ylab=y-label
- Label along the Y-axis [default: N or log(N)].
- headline=text
- Random verbiage,
will be plotted along right top of plot for identification [default: none].
The left top will contain the filename infile.
- tab=t|f
- Table output? If false,
a plot using your current yapp(5NEMO)
device will be given. If true, a simple
ascii version of a histogram is shown. The default choice of bins=16 keeps
all the information on a simple 80*24 screen. [Default: f]
- gauss=t|f
- If true,
plot output will contain a gaussian fit. The "best" gaussian fit is the
one that has the same mean, dispersion and integrated area as the data.
[Default: t].
- residual=t|f
- If true, the residual (data - fit) will be plotted
as a dashed line. [Default: t].
- cumul=t|f
- Plot a cumulative histogram instead.
If set, the keywords gauss, residual, and ylog are automatically re-set
to false. [Default: f].
- norm=t|f
- Normalize the cumulative histogram to 1.
By default the maxcount is used. [Default: f].
- median=t|f
- Compute median too?
This can be timeconsuming for large numbers of points. [Default: t].
- torben=t|f
- Fast median calculator in case N is large. No sort will be used, but instead
an iterative O(N) method will be used. [Default: f].
- robust=t|f
- Use a robust
estimator to remove outliers before taking statistics again. [Default: f].
- mad=t|f
- Also compute the Mean Absolute Deviation (MAD). [Default: f].
- dual=t|f
- Dual pass over the data, subtracting the mean. This can be important if
the data have a large average value with a small spread around the mean.
Notice that in this mode the mean is subtracted but never added to the
min/max/mean/median etc. Default: f
- nsigma=
- Delete points more than nsigma
times the sigma from the mean. After each point sigma and mean are recomputed.
[Default: none]
- qac=t|f
- If selected, a QAC listing is selected. This gives
the mean, rms, min and max. [Default: f]
- sort=sort_mode
- Default: qsort.
If flogger is enabled additional sorting modes are available: bubble,
heap, insert, merge, quick, shell.
- pyplot=
- If given, it will be the filename
where a template python script that can serve as starting point for more
elaborate plotting. Default: none.
There is no direct way to plot
a particular column from a table while selecting from another column. The
tabmath(1NEMO)
program would need to be used in pipe to select, as given
in the following example:
% tabgen - 1000 2 | tabmath - - selfie=’range(%1,0.0,0.5)’ | tabhist - 2
where column 2 is only used where column 1 is between 0 and 0.5.
There
is a compile option to make tabhist_flogger which enables more sorting
modes.
Points that fall exactly on the boundary of a bin, are added
to the right side. Another way of saying is that a cell includes the left
side, but is open ended on the right side, i.e. [---). Note this behavior is
exactly opposite that of SM’s behavior, which was used to compare results.
tablsqfit(1NEMO)
, tabmath(1NEMO)
, tabstat(1NEMO)
, tabtrend(1NEMO)
,
tabint(1NEMO)
, tabgen(1NEMO)
http://vostat.org
http://www.star.bris.ac.uk/~mbt/stilts/
http://arxiv.org/ps/0807.4820 (choosing the binning for a histogram)
Peter
Teuben
~/src/kernel/tab sources
xx-mar-88 V1.0: created PJT
15-Apr-88 V1.1: higher order moments, Y scale PJT
1-jun-88 V2.0: new name, code same PJT
28-oct-88 V2.0a: updated doc + labels plotting done PJT
13-nov-93 V2.7: added gaussian model + residuals PJT
11-jul-96 V2.8: log scale is now 10-based, not e PJT
12-apr-97 V3.0: added cumulative option PJT
24-apr-98 V3.0a: fix median calculation for restricted range PJT
22-dec-99 V3.1a: optional median, fix N=1 reporting bug PJT
24-jan-00 documentation updated with program PJT
7-jun-01 3.2: added nsigma, corrected man page options PJT
7-may-03 4.0: multiple columns allowed PJT
28-jan-05 5.0: separate xmin/xmax=, added sort=, fix median if nsigma PJT
1-jun-10 6.0: bins= now allowed to have manual edges PJT
22-aug-12 6.2: added torben= option for fast large-N median PJT
16-jan-14 6.4: added mad= PJT
8-jan-2020 7.0: added pyplot= PJT
2-mar-2020 7.1: added norm= PJT
14-nov-2021 7.4: added qac= PJT
Table of Contents