It also uses two memory models of keeping a snapshot: an array of structs (AOS) vs. a struct of arrays (SOA). snapshots’ are normally stored on disk as SOA’s, but most programs actually transpose them to an AOS, which can be very in-efficient.
A hidden code snaprun compares defining simple variables (x,y,z) in a structure with an array pos[3], and also looping over pos[] insted of hardcoding a simpler Eulerian type of update. Counter intuitive, looping made the code about 4-8 times slower. Simple variables are slightly faster than using directly addressing the array. Results vary a bit on different processors.
Here is the standard test that si part of NEMO’s "make bench"
make -f $NEMO/src/nbody/trans/Benchfile
all
mkplummer p6 1000000
snapbench p6 ’mass=1.0001’ iter=1000 mode=0
snapbench p6 ’mass=1.0001*m’ iter=1000 mode=1
snapbench p6 ’mass=1.0001’ iter=1000 mode=2
snapbench p6 ’mass=1.0001’ iter=1000 mode=3
with the following examples
CPU modes: 0 1 2 3 m2 (air) 0.36 2.89 2.81 0.37 i9-12900K 0.20 4.42 3.30 0.23 i7-1260P 0.30 6.39 5.89 0.37
src/nbody/trans/snapbench.c src/tutor/nbody/snaprun.c
12-mar-2024 Finally documented PJT