This HTML automatically generated with rman for NEMO
Table of Contents

Name

tabhist - histogram plotter and gaussian fit for tabular data

Synopsis

tabhist in=infile [parameter=value]

Description

tabhist plots a histogram and computes the first 4 moments of points obtained from an (set of) column(s) from an ascii table(5NEMO) . It calculates the mean, dispersion, skewness and kurtosis. It then plots the data in the form of a histogram, optionally with a ’best’ gaussian fit and residuals. It can also plot a cumulative histogram of the data (see also tabtrend(1NEMO) and tabint(1NEMO) ).

The Median is also optionally computed, for which an alternate sorting routine can be selected (if available).

The TriMean is reported, which is a robust statistic for the central
tendency, the weighted average of Median and two quartiles: (Q1 + 2*Q2 + Q3) / 4.

If multiple columns are used, they are appended to each other and treated like a single long column. If you need simple statistics for multiple columns, see tabstat(1NEMO) instead.

Using the nsigma keyword, outliers can be removed and statistics re-computed.

Parameters

The following parameters are recognized in order; they may be given in any order if the keyword is also given. Use --help to confirm this man page is up to date.
in=infile
input file, in tabular format [no default].
xcol=column
column(s) from which the (X) values are taken. Multiple columns can be given in the usual nemoinp(3NEMO) notation for integers. [default 1].
xmin=x_min
Minimum value along X-axis to include [default: autoscaling on minimum of datapoints].
xmax=x_max
Maximum value along X-axis to include [default: autoscaling on maximum of datapoints].
bins=n_bins
If one number is given, it is the number of (equal size) bins between min and max. If more than one number is given, they are the bin edges (thus one more than the number of bins need to be supplied). One rule of thumb is that the number of bins be the square root of the number of values. [default: 16].
maxcount=count
Maximum count value per bin plotted along the Y-axis [default: autoscaling].
nmax=max_lines
Maximum number of lines allowed to read from datafile, if the data comes from a pipe as supposed from file. A regular file will use the number of lines in the file as default. [Default: 0].
ylog=t|f
Take log of Y-axis? This option is really redundant, since tabmath(1NEMO) can handle any transformation [default: f].
xlab=x-label
Label along the X-axis [default: value].
ylab=y-label
Label along the Y-axis [default: N or log(N)].
headline=text
Random verbiage, will be plotted along right top of plot for identification [default: none]. The left top will contain the filename infile.
tab=t|f
Table output? If false, a plot using your current yapp(5NEMO) device will be given. If true, a simple ascii version of a histogram is shown. The default choice of bins=16 keeps all the information on a simple 80*24 screen. [Default: f]
gauss=t|f
If true, plot output will contain a gaussian fit. The "best" gaussian fit is the one that has the same mean, dispersion and integrated area as the data. [Default: t].
residual=t|f
If true, the residual (data - fit) will be plotted as a dashed line. [Default: t].
cumul=t|f
Plot a cumulative histogram instead. If set, the keywords gauss, residual, and ylog are automatically re-set to false. [Default: f].
norm=t|f
Normalize the cumulative histogram to 1. By default the maxcount is used. [Default: f].
median=t|f
Compute median too? This can be timeconsuming for large numbers of points. [Default: t].
torben=t|f
Fast median calculator in case N is large. No sort will be used, but instead an iterative O(N) method will be used. [Default: f].
robust=t|f
Use a robust estimator to remove outliers before taking statistics again. [Default: f].
mad=t|f
Also compute the Mean Absolute Deviation (MAD). [Default: f].
dual=t|f
Dual pass over the data, subtracting the mean. This can be important if the data have a large average value with a small spread around the mean. Notice that in this mode the mean is subtracted but never added to the min/max/mean/median etc. Default: f
nsigma=
Delete points more than nsigma times the sigma from the mean. After each point sigma and mean are recomputed. [Default: none]
qac=t|f
If selected, a QAC listing is selected. This gives the mean, rms, min and max. [Default: f]
sort=sort_mode
Default: qsort. If flogger is enabled additional sorting modes are available: bubble, heap, insert, merge, quick, shell.
pyplot=
If given, it will be the filename where a template python script that can serve as starting point for more elaborate plotting. Default: none.

Examples

There is no direct way to plot a particular column from a table while selecting from another column. The tabmath(1NEMO) program would need to be used in pipe to select, as given in the following example:
    % tabgen - 1000 2 | tabmath - - selfie=’range(%1,0.0,0.5)’ | tabhist - 2
    
where column 2 is only used where column 1 is between 0 and 0.5.

Flogger

There is a compile option to make tabhist_flogger which enables more sorting modes.

Bugs

Points that fall exactly on the boundary of a bin, are added to the right side. Another way of saying is that a cell includes the left side, but is open ended on the right side, i.e. [---). Note this behavior is exactly opposite that of SM’s behavior, which was used to compare results.

See Also

tablsqfit(1NEMO) , tabmath(1NEMO) , tabstat(1NEMO) , tabtrend(1NEMO) , tabint(1NEMO) , tabgen(1NEMO)

http://vostat.org

http://www.star.bris.ac.uk/~mbt/stilts/

http://arxiv.org/ps/0807.4820 (choosing the binning for a histogram)

Author

Peter

Teuben

Files


~/src/kernel/tab    sources

Update History


xx-mar-88    V1.0: created              PJT
15-Apr-88    V1.1: higher order moments, Y scale      PJT
1-jun-88    V2.0: new name, code same    PJT
28-oct-88    V2.0a: updated doc + labels plotting done    PJT
13-nov-93    V2.7: added gaussian model + residuals    PJT
11-jul-96    V2.8: log scale is now 10-based, not e    PJT
12-apr-97    V3.0: added cumulative option    PJT
24-apr-98    V3.0a: fix median calculation for restricted range    PJT
22-dec-99    V3.1a: optional median, fix N=1 reporting bug    PJT
24-jan-00    documentation updated with program    PJT
7-jun-01    3.2: added nsigma, corrected man page options    PJT
7-may-03    4.0: multiple columns allowed    PJT
28-jan-05    5.0: separate xmin/xmax=, added sort=, fix median if nsigma     PJT
1-jun-10    6.0: bins= now allowed to have manual edges    PJT
22-aug-12    6.2: added torben= option for fast large-N median    PJT
16-jan-14    6.4: added mad=     PJT
8-jan-2020    7.0: added pyplot=    PJT
2-mar-2020    7.1: added norm=    PJT
14-nov-2021    7.4: added qac=        PJT


Table of Contents