SDMATH(1NEMO) manual page

This HTML automatically generated with rman for NEMO
Table of Contents

Description

This program benchmarks some OpenMP parallellizable Single Dish vector math in a typical 4-phase calibration cycle with an ON/OFF with a separate COLD/HOT in each pointing.

Each spectrum (called row below) has nchan channels, Each nscan has 4 rows, with the CalOn and CalOff (Hot and Cold if you wish) for the Sig and the Ref (On and Off if you wish). The following math is needed to construct a spectrum, thus the total operations scales as nscan*nchan*iter.

Tsys = Tc * <row1> / <row2-row1> <cold> / <hot>-<cold>
Ta = Tsys * ( (row1+row2)/(row3+row4) - 1) on/off - 1
Spectrum = <Ta> time-averaging

This block of memory of size 4*nchan*nscan can be iteratively visited, to get a more reliable measure of the CPU usage. It can make a huge difference if this block fits in one of the CPU caches.

The benchmark here is very basic, and only deals with the math with simulated data without the need to pass through a fitsio type library. The file I/O bench portion of this benchmark reads a large file to get an idea of I/O overhead.

A slightly more realistic one can be found in sdinfo(1NEMO) reading actual sdfits(5NEMO) files and selecting a set of operations.

The standard benchmark loops 10 times over 1000 scans of 100,000 channels (thus 1e9 row operations) in about 2-3 secs. We call this 1 GRop.

Parameters

The following parameters are recognized in order; they may be given in any order if the keyword is also given. Use --help to confirm this man page is up to date.

nscan=int

Number of scans. Can be used to time-average (see aver= below).
[1000]

nchan=int

Number of channels per spectrum.
[100000]

iter=int

How many times to iterate (0 means do nothing).
[10]

mode=int

Mode of math. Currently the equivalent of a 4-phase PS observation is simulated, but a simpler 2-phase TP observation should be implemented as well. mode=1: random numbers mode=2: sequence numbers, easier to compare with a python implementation.
[1]

aver=t|f

Add time average over nscan.
[f]

in=filename

If given, read this file into memory. Meant to measure disk I/O
Not provided.

bs=int

Blocksize in kB to read file with.
[16]

maxbuf=int

Maximum buffer size (in MB) for reading buffer. If 0 is given, the full filesize is used, which for large files can potentially run into the swap.
[0]

seed=int

seed for the random number generator. A value of 0 (the default) will be converted into a unique new value using UNIX’s clock time, in seconds since 1970.0). See also xrandom(1NEMO) for more options.
[Default: 0]

Examples

Standard benchmark was extended to iter=100 to avoid system time. These are 10 Grops. CPU times are user space seconds [nemobench5 scores listed where available]

Xeon E5620 @ 2.40GHz (fourier) - 115.1 [218] AMD EPYC 7302 @ 3.0 GHz (lma) - 34.1 [675] Ultra 7 155H @ 4.5 GHz (d76) - 21.8 [~1200] M4 air - 15.4 [~2000]

To view the impact of OpenMP, here is an example performance on an Ultra 7 155H, measured in wall clock time seconds, and iter=20:

OMP_NUM_THREADS=2 /usr/bin/time sdmath 1000 160000 iter=20

nscan nchan 1 2 4 8 12 16 20 ----- ------ --- --- --- --- --- --- --- 1000 160000 7.8 5.0 3.9 4.2 3.7 3.6 5.7
2000 80000 7.8
4000 40000 7.8
8000 20000 7.8
16000 10000 8.0 5.4 4.4 5.3 5.2 9.5 9.9 ----- ------ --- --- --- --- --- --- ---

Somewhat surprisingly that barely a factor of 2 can be gained by going multi-core.

A read benchmark can be done with a large series of Plummer models. The example file created here is about 5GB in size: mkplummer p1M 1000000 nmodel=100
/usr/bin/time sdmath in=p1M
/usr/bin/time sdmath in=p1M maxbuf=1000
the latter resets the buffer 5 times, where the last read is a partial one.

30-apr-2025    Created        PJT

Table of Contents

Name
Synopsis
Description
Parameters
Examples
See Also
Files
Author
Update History