Each spectrum (called row below) has nchan channels, Each nscan has 4 rows, with the CalOn and CalOff (Hot and Cold if you wish) for the Sig and the Ref (On and Off if you wish). The following math is needed to construct a spectrum, thus the total operations scales as nscan*nchan*iter.
Tsys = Tc * <row1> / <row2-row1>
<cold> / <hot>-<cold>
Ta = Tsys * ( (row1+row2)/(row3+row4) - 1) on/off - 1
Spectrum = <Ta> time-averaging
This block of memory of size 4*nchan*nscan can be iteratively visited, to get a more reliable measure of the CPU usage. It can make a huge difference if this block fits in one of the CPU caches.
The benchmark here is very basic, and only deals with the math with simulated data without the need to pass through a fitsio type library. The file I/O bench portion of this benchmark reads a large file to get an idea of I/O overhead.
A slightly more realistic one can be found in sdinfo(1NEMO) reading actual sdfits(5NEMO) files and selecting a set of operations.
The standard benchmark loops 10 times over 1000 scans of 100,000 channels (thus 1e9 row operations) in about 2-3 secs. We call this 1 GRop.
Standard benchmark was extended to iter=100 to avoid system time. These are 10 Grops. CPU times are user space seconds [nemobench5 scores listed where available]
Xeon E5620 @ 2.40GHz (fourier) - 115.1 [218] AMD EPYC 7302 @ 3.0 GHz (lma) - 34.1 [675] Ultra 7 155H @ 4.5 GHz (d76) - 21.8 [~1200] M4 air - 15.4 [~2000]
To view the impact of OpenMP, here is an example performance on an Ultra 7 155H, measured in wall clock time seconds, and iter=20:
OMP_NUM_THREADS=2 /usr/bin/time sdmath 1000 160000 iter=20
nscan nchan 1 2
4 8 12 16 20 ----- ------ --- --- --- --- --- --- --- 1000
160000 7.8 5.0 3.9 4.2 3.7 3.6 5.7
2000 80000 7.8
4000 40000 7.8
8000 20000 7.8
16000 10000 8.0 5.4 4.4 5.3 5.2 9.5 9.9 ----- ------ --- --- ---
--- --- --- ---
Somewhat surprisingly that barely a factor of 2 can be gained by going multi-core.
A read benchmark can be done with a large series
of Plummer models. The example file created here is about 5GB in size:
mkplummer p1M 1000000 nmodel=100
/usr/bin/time sdmath in=p1M
/usr/bin/time sdmath in=p1M maxbuf=1000
the latter resets the buffer 5 times, where the last read is a partial
one.
30-apr-2025 Created PJT