Skip to content

Performance and Lua tables

Jip edited this page Feb 5, 2023 · 26 revisions

Lua tables - memory

Everything written in this article completely depends on your Lua environment. This article is specifically written for the Lua environment of Supreme Commander: Forged Alliance.

A table

A Lua table consists of two parts: an array part and a hash part. This is best explained in chapter 4 of The implementation of Lua 5.0. We'll use the same terminology as the paper and we'll assume you've read at least that chapter.

Some table functions only interact with the array part. We'll summarize a non-exhaustive list:

  • (1) table.insert
  • (2) table.getn
  • (3) table.random
  • (4) table.remove
  • (5) table.sort

Other table functions that operate on both parts:

  • (6) table.getsize
  • (7) table.empty

We mention this because these functions may not act as you'd expect if you do not know the distinction. As an example: table.getn does not take into account elements in the hash part, and may therefore be inaccurate if you expect them to be included.

A comparison

A Lua table is an expensive data structure in comparison to data structures available in other languages. The header consumes 40 bytes, regardless of whether the table has any elements in it. Each element in the array part consumes 8 bytes. Each element in the hash part consumes 20 bytes.

We'll make a comparison to create a perspective on how much more expensive these data structures are. For simplicity we'll start our comparison with structs from C. We'll start with a data structure that represents a vector or a point. Such a data structure has two entries, one for each axis.

// occupies 8 bytes
struct Vector {
  float x;
  float y;
}
-- occupies 40 + 16 = 56 bytes
local vector1 = { 0, 0 }

-- occupies 40 + 80 = 120 bytes
local vector2 = { x = 0, y = 0 }

This initial comparison on a simple data structure shows the memory overhead of Lua tables: in comparison to a struct in c a Lua table can occupy up to 10 times the amount of memory. A consequence of this is that we need to be careful how and when we use tables.

Details on the hash part

Details on a hash operation

Performance tricks

Before you continue it is important to understand that you should only try and optimize code when it meets the following conditions:

  • (1) The code needs to be functional: you can't optimize code that either doesn't exist, or code that is incomplete. Any attempt at optimizing before the code is functional is premature.
  • (2) The (functionality of the) code needs to be testable: you'll introduce significant changes to your code in order to optimize it. This will introduce bugs. It is therefore vital to be able to continuously test your changes as you progress.
  • (3) The (performance criteria of the) code needs to be measurable: no matter how good you (think you) are, if you can't measure it then you're blind. And if you're blind then you may be spending a lot of time on something that is irrelevant.

This is best explained in the explainer of premature performance improvements by xkcd. This article won't help you (1). For (2) you can think in terms of being able to run the sequence you're trying to optimize by pressing a button. And for (3) you can measure time using the global GetSystemTimeSecondsOnlyForProfileUse and you can measure the (complete) size of a table using the function ToBytes as introduced by #4523.

Do not use tables

Often tables are used not because they are strictly required, but because they help us structure our code. We'll use two examples here:

  • Changes related to the function CalculateBallisticAcceleration, as done by #4064

The function allows bombers to properly compute their bomb trajectory. It is a perfect example of this behavior where the author used tables for their semantics: it naturally groups values together. As a few examples:

  • local proj = {pos=projectile:GetPosition(), vel=VMult(Vector(launcher:GetVelocity()), 10)}
  • local dist = {pos=XZDist(proj.pos, target.pos), vel=XZDist(proj.vel, target.vel)}
  • target.tpos = {target.pos[1] + time * target.vel[1], 0, target.pos[3] + time * target.vel[3]}

As a consequence the author is allocating a lot of memory that is not strictly required. And all of this memory is used once, just to throw it away again. It is a good example of table trashing. The refactored code removes these semantics and instead uses more locals. As an example:

local time = distPos / distVel
local targetNewPosX = targetPosX + time * targetVelX
local targetNewPosZ = targetPosZ + time * targetVelZ
local targetNewPosY = GetSurfaceHeight(targetNewPosX, targetNewPosZ)
  • Changes related to the function RotateTowardsEnemy, as done by f1bdf61 and 30cf907

The function allows point defense to rotate towards the nearest threat as they are created. It is another great example of this behavior where the author noticed that the complexity of the function was higher than it could be. In turn, the code is refactored and the result can be found in f1bdf61. Each 'better' threat would run the MakeTarget function with its sole purpose to return a new table. Each table uses 3 hash entries and therefore we're allocating 120 byte tables for each 'better' threat! A few months later the same author noticed this issue and refactored it into 30cf907: instead of allocating a new table for each threat the same table is re-used.

One has to wonder however: why do we need that table at all? The table is not returned and therefore only used within the scope of the function. We might as well declare three locals - like we did with the first example. It would save dozens of hash operations too, which coincidentally is our next topic 😄 !

Cache the hash operation

When tables are required we often end up running the same hashing operations multiple times. These are not cached by Lua - and as we know now the hashing operation is has a price tag. We'll use two examples here:

  • (1) Changes related to the function Heapify as done by #4399.

The function allows us to restore the heap property where values are stored in a specific order. A heap is a data structure that acts like a priority queue. It is used for path finding. These changes are a great example of caching the hash operation as the performance of the code improved by more than 60%! For ease of understanding we'll put the changes next to each other. We highly recommend you to carefully observe them.

No caching Caching
Heapify = function(self)
    local index = 1
    local left = self:ToLeftChild(index)
    local right = self:ToRightChild(index)

    while self.Heap[left] do
        local min = left

        if self.Heap[right] and 
            (self.Heap[right].TotalCosts < self.Heap[left].TotalCosts) 
        then
            min = right
        end
        
        if self.Heap[min].TotalCosts > self.Heap[index].TotalCosts then
            return
        end

        self:Swap(index, min)

        index = min
        left = self:ToLeftChild(index)
        right = self:ToRightChild(index)
    end
end,
Heapify = function(self)
    -- cache of often used table entries
    local heap = self.Heap
    local heap_size = self.HeapSize

    local index = 1
    local left = 2 * index
    local right = 2 * index + 1

    while left <= heap_size do
        local min = left

        if heap[right] and 
            (heap[right].TotalCosts < heap[left].TotalCosts) 
        then
            min = right
        end

        if heap[min].TotalCosts > heap[index].TotalCosts then
            return
        end

        local tmp = heap[min]
        heap[min] = heap[index]
        heap[index] = tmp

        index = min
        left = 2 * index
        right = 2 * index + 1
    end
end,

You may notice that there are other changes too - Zjonn mentioned that primarily caching the hash operations were responsible for the performance improvements. This is a relatively simple change and is safe when:

  • The value you cache is read only
  • The value you cache is a table, as these are passed by reference

Re-using tables

Abusing tables

Clone this wiki locally