-
Notifications
You must be signed in to change notification settings - Fork 251
Performance and Lua tables
Everything written in this article completely depends on your Lua environment. This article is specifically written for the Lua environment of Supreme Commander: Forged Alliance.
A Lua table consists of two parts: an array part and a hash part. This is best explained in chapter 4 of The implementation of Lua 5.0. We'll use the same terminology as the paper and we'll assume you've read at least that chapter.
Some table functions only interact with the array part. We'll summarize a non-exhaustive list:
- (1)
table.insert - (2)
table.getn - (3)
table.random - (4)
table.remove - (5)
table.sort
Other table functions that operate on both parts:
- (6)
table.getsize - (7)
table.empty
We mention this because these functions may not act as you'd expect if you do not know the distinction. As an example: table.getn does not take into account elements in the hash part, and may therefore be inaccurate if you expect them to be included.
A Lua table is an expensive data structure in comparison to data structures available in other languages. The header consumes 40 bytes, regardless of whether the table has any elements in it. Each element in the array part consumes 8 bytes. Each element in the hash part consumes 20 bytes.
We'll make a comparison to create a perspective on how much more expensive these data structures are. For simplicity we'll start our comparison with structs from C. We'll start with a data structure that represents a vector or a point. Such a data structure has two entries, one for each axis.
// occupies 8 bytes
struct Vector {
float x;
float y;
}-- occupies 40 + 16 = 56 bytes
local vector1 = { 0, 0 }
-- occupies 40 + 80 = 120 bytes
local vector2 = { x = 0, y = 0 }This initial comparison on a simple data structure shows the memory overhead of Lua tables: in comparison to a struct in c a Lua table can occupy up to 10 times the amount of memory. A consequence of this is that we need to be careful how and when we use tables.
Before you continue it is important to understand that you should only try and optimize code when it meets the following conditions:
- (1) The code needs to be functional: you can't optimize code that either doesn't exist, or code that is incomplete. Any attempt at optimizing before the code is functional is premature.
- (2) The (functionality of the) code needs to be testable: you'll introduce significant changes to your code in order to optimize it. This will introduce bugs. It is therefore vital to be able to continuously test your changes as you progress.
- (3) The (performance criteria of the) code needs to be measurable: no matter how good you (think you) are, if you can't measure it then you're blind. And if you're blind then you may be spending a lot of time on something that is irrelevant.
This is best explained in the explainer of premature performance improvements by xkcd. This article won't help you (1). For (2) you can think in terms of being able to run the sequence you're trying to optimize by pressing a button. And for (3) you can measure time using the global GetSystemTimeSecondsOnlyForProfileUse and you can measure the (complete) size of a table using the function ToBytes as introduced by #4523.
Often tables are used not because they are strictly required, but because they help us structure our code. We'll use two examples here:
- Changes related to the function
CalculateBallisticAcceleration, as done by #4064
The function allows bombers to properly compute their bomb trajectory. It is a perfect example of this behavior where the author used tables for their semantics: it naturally groups values together. As a few examples:
local proj = {pos=projectile:GetPosition(), vel=VMult(Vector(launcher:GetVelocity()), 10)}local dist = {pos=XZDist(proj.pos, target.pos), vel=XZDist(proj.vel, target.vel)}target.tpos = {target.pos[1] + time * target.vel[1], 0, target.pos[3] + time * target.vel[3]}
As a consequence the author is allocating a lot of memory that is not strictly required. And all of this memory is used once, just to throw it away again. It is a good example of table trashing. The refactored code removes these semantics and instead uses more locals. As an example:
local time = distPos / distVel
local targetNewPosX = targetPosX + time * targetVelX
local targetNewPosZ = targetPosZ + time * targetVelZ
local targetNewPosY = GetSurfaceHeight(targetNewPosX, targetNewPosZ)The function allows point defense to rotate towards the nearest threat as they are created. It is another great example of this behavior where the author noticed that the complexity of the function was higher than it could be. In turn, the code is refactored and the result can be found in f1bdf61. Each 'better' threat would run the MakeTarget function with its sole purpose to return a new table. Each table uses 3 hash entries and therefore we're allocating 120 byte tables for each 'better' threat! A few months later the same author noticed this issue and refactored it into 30cf907: instead of allocating a new table for each threat the same table is re-used.
One has to wonder however: why do we need that table at all? The table is not returned and therefore only used within the scope of the function. We might as well declare three locals - like we did with the first example. It would save dozens of hash operations too, which coincidentally is our next topic 😄 !
When tables are required we often end up running the same hashing operations multiple times. These are not cached by Lua - and as we know now the hashing operation is has a price tag. We'll use two examples here:
- (1) Changes related to the function
Heapifyas done by #4399.
The function allows us to restore the heap property where values are stored in a specific order. A heap is a data structure that acts like a priority queue. It is used for path finding. These changes are a great example of caching the hash operation as the performance of the code improved by more than 60%! For ease of understanding we'll put the changes next to each other. We highly recommend you to carefully observe them.
| No caching | Caching |
Heapify = function(self)
local index = 1
local left = self:ToLeftChild(index)
local right = self:ToRightChild(index)
while self.Heap[left] do
local min = left
if self.Heap[right] and
(self.Heap[right].TotalCosts < self.Heap[left].TotalCosts)
then
min = right
end
if self.Heap[min].TotalCosts > self.Heap[index].TotalCosts then
return
end
self:Swap(index, min)
index = min
left = self:ToLeftChild(index)
right = self:ToRightChild(index)
end
end, |
Heapify = function(self)
-- cache of often used table entries
local heap = self.Heap
local heap_size = self.HeapSize
local index = 1
local left = 2 * index
local right = 2 * index + 1
while left <= heap_size do
local min = left
if heap[right] and
(heap[right].TotalCosts < heap[left].TotalCosts)
then
min = right
end
if heap[min].TotalCosts > heap[index].TotalCosts then
return
end
local tmp = heap[min]
heap[min] = heap[index]
heap[index] = tmp
index = min
left = 2 * index
right = 2 * index + 1
end
end, |
You may notice that there are other changes too - Zjonn mentioned that primarily caching the hash operations were responsible for the performance improvements. This is a relatively simple change and is safe when:
- The value you cache is read only
- The value you cache is a table, as these are passed by reference