You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/using-gitbase/optimize-queries.md
+19Lines changed: 19 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,7 @@ Even though in each release performance improvements are included to make gitbas
5
5
There are two ways to optimize a gitbase query:
6
6
- Create an index for some parts.
7
7
- Making sure the joined tables are squashed.
8
+
- Making sure not squashed joins are performed in memory.
8
9
9
10
## Assessing performance bottlenecks
10
11
@@ -57,6 +58,24 @@ Some performance issues might not be obvious, but there are a few that really st
57
58
58
59
- Joins not squashed. If you performed some joins between tables and instead of a `SquashedTable` node you see `Join` and `Table` nodes, it means the joins were not successfully squashed. There is a more detailed explanation about this in next sections of this document.
59
60
- Indexes not used. If you can't see the indexes in your table nodes, it means somehow those indexes are not being used by the table. There is a more detailed explanation about this in next sections of this document.
61
+
- Joins not squashed that are not being executed in memory. There is a more detailed explanation about this in the next sections of this document.
62
+
63
+
## In-memory joins
64
+
65
+
There are two modes in which gitbase can execute an inner join:
66
+
67
+
- Multipass: it fully iterates the right side of the join one time for each row in the left side. This is really expensive, but avoids having to load one side fully in memory.
68
+
- In-memory: loads the whole right side in memory and iterates the left side. Both sides are iterated exactly once, thus it makes the query much faster, but it has the disadvantage of potentially requiring a lot of memory.
69
+
70
+
The default mode is multipass, unless the right side fits in memory (there's a more elaborate explanation about this below).
71
+
72
+
In-memory joins can be enabled at the user request, either with the `EXPERIMENTAL_IN_MEMORY_JOIN=on` environment variable or executing `SET inmemory_joins = 1`. The last method only enables it for the current connection.
73
+
74
+
Even if they are not globally enabled for all queries, there is an optimization that checks if the join could be performed in memory and if it can't, switches to multipass mode.
75
+
As long as the whole gitbase server memory usage is under the 20% of all available physical (not counting other memory used by other processes) memory in the machine, the join will be performed in memory. When this limit is passed, the multipass mode will be used instead.
76
+
20% is just a default value that can be changed using the `MAX_MEMORY_INNER_JOIN` environment variable to the maximum amount of bytes the gitbase server can be using before switching to multipass mode. It can also be changed per session using `SET max_memory_joins=<MAX BYTES>`.
77
+
78
+
So, as a good rule of thumb, the right side of an inner join should always be the smaller one, because that way, it has bigger chances of being executed in memory and it will be faster.
0 commit comments