Skip to content

Releases: matthaeusheer/fastcode

PSO: moved global data out of obj functions

01 Jun 08:35

Choose a tag to compare

Up until this point, the objective functions still initialized some constant data every time they were run. This is no longer the case, a new function init_obj_globals() allows to initialize constant global data used by the objective functions which then no longer need to load this constant data on every call.

PSO: changed data structures for better locality

01 Jun 08:47

Choose a tag to compare

Currently PSO uses about 5 data arrays containing position, velocity, local best position, fitness, and local best fitness for every particle. Note that since those arrays are fully independent in memory, there might be cache conflicts between some of those, even for a single particle. What this means is that the velocity of a particle might conflict with the position of it in cache. If this is the case, it actually can cause massive problems, since the jumps are the same size, meaning this conflict will actually occur for every particle in the swarm.

This patch tries to alleviate that problem by creating a single array of structs, each representing a single particle, containing all its information. The advantage of this is that only this data is required in every iteration of the main algorithm loop, in addition to the position of the best particle. That data will never cause cache conflicts as it is contiguous in memory, hence the only conflicts it can possibly create is if the whole particle does not fit into cache.

Moreover, this makes the code much easier to understand.

Base Implementation with Doubles

24 May 10:36

Choose a tag to compare

This release is the same as the base release. This means that no real optimization has been performed on the algorithms at this time. The general testing and performance infrastructure is also already in place in this release.

The only difference to the base release is that the default data type here is doubles whereas it is floats in the base implementation.

PSO: improved rand some more

23 May 20:35

Choose a tag to compare

Improved RNG to generate random numbers within a range without the need for division. This is performed using absolute values and multiplication over previously computed inverses.

PSO: improved interface objective functions

23 May 20:34

Choose a tag to compare

Removed the conversion back and forth from __m256 that was previously required due to the objective function interface. This again highly reduces the number of loads and stores required to run the algorithm.

PSO: cache temporal locality

23 May 18:18

Choose a tag to compare

Instead of doing:

  • go over full population: update velocity
  • go over full population: update position
  • go over full population: update fitness
  • go over full population: update best fitness
  • go over full population: update best position

We use a cached best position and do:

  • go over full population:
    • update velocity for particle
    • update position for particle
    • update fitness for particle
    • update best fitness for particle
    • update best position for particle

This improves cache temporal locality since:

  • best position, position and velocity are required for velocity update
  • velocity is required for position update
  • position is required for fitness update
  • fitness is required for best fitness update
  • fitness and position are required for best position update

These can then all fit into cache for a single particle. Hence not requiring several loads into cache for a single particle.

PSO: ported everything to _m256

21 May 14:21

Choose a tag to compare

ported pso to make all functions to take _m256 vectors, thereby reducing the number of loads and stores.

This reduces the number of loads and stores that require to be done in individual part of the algorithm to be able to run vectorised code. Note this does not change the interface to the objective function, where the __m256 values are stored into an array, passed to the objective, which in term then loads them back into __m256 form for vectorisation.

Base implementation

23 May 13:32

Choose a tag to compare

This release is the base implementation of all algorithms. This means that no real optimization has been performed on the algorithms at this time. The general testing and performance infrastructure is also already in place in this release.

PSO: velocity update vectorisation

18 May 13:46

Choose a tag to compare

Vectorised pso_update_velocity() with custom 0 to 1 RNG. Again, this still contains a floating point division, hence not fully performant yet.

PSO: rand_init RNG

18 May 13:43

Choose a tag to compare

Added high performance vectorised RNG using XORs and shifts. Used this RNG for the rand_init() function. Note that this still contains divisions on floating point level. Hence it is not fully optimised yet. See later releases.