Skip to content

Ci#1

Closed
carlosganzerla wants to merge 40 commits intomasterfrom
ci
Closed

Ci#1
carlosganzerla wants to merge 40 commits intomasterfrom
ci

Conversation

@carlosganzerla
Copy link
Owner

No description provided.

Added `benchmark.py` script and `README.md` on the `benchmark` folder.
Doing this script, an overflow bug on the union operation and possibly
others was found and solved. Overall, **`pg_set` outperforms all
alternatives**.

Added comments to some operations for clarity.

Removed `-O0` flag from `Makefile`
Added large set test to compare if pg_set operators are correct.
Compared against operators from `intarray` with sets as large as 65k
elements.
Added support for GIN index (op-class, support functions).
Implementation is straightforward and very similar to the 1-D array
case.

Added new test cases for the GIN index (these mostly test if there are no
inconsistencies between bitmap scans and seq scans)

Refactored benchmark script to support multiple access methods, making
easier to test new index types. Added GIN benchmarks where possible.
Added GiST operator class for `&&`, `@>` and `=`. The GiST
representation is a signature tree mixed with an interval for faster
overlaps checking. The op-class has a `masklen` option which defines the
number of bits on the signature (called mask). It's basically a set mask
as it works on the type itself, but with a predetermined length instead
of dynamic. Union is straightforward, and penalty is basically the
hamming distance with some small tweaks for the known overlaps (or lack
of) using the min/max elements. Picksplit was basically copied from
`intarray` with some tweaks for the `datum_exists` flags.

Removed large_set_test. It'll be replaced with a deterministic test
soon, which will not depend on another extension.

Refactored some macros for reuse.
Readded on main test this time.
Added GiST to benchmark

Stopped using generators and switched to using DICTS

Running selects from the same type sequentially instead of multi-type
round robbin

Improved access method preparation and indexes are only created when
their respective access method starts being benchmarked
Improved implementation and also added previously removed benchmark
cases.
Instead of generating tables from scratch on every benchmark, now each
benchmark copies distributions from data files. The different
distributions can be used to measure performance on different data
scenarios.

Also added automatic benchmark `README.md` generation. Added poetry to
this end to install some deps.
Add `pg_set_remove` function and its operator counterpart  (`-`)

Added hash index support function and op-class

Changed add element from `||` to `+`

Re-ran benchmark with updated operators
Added `ANALAYZE` support reusing native array analysis. It takes
advantage of the `array_typanalyze` function of Postgres. It basically
calls the array native function to analyze set rows, and it plugs a
special `fetchfunc` to cast the set to an array so it doesn't have to
reinvent the wheel in the analysis of most common values, elements and
element count histogram. Unfortunately the same is not possible for
the selectivity functions, so they were rewritten, but optimize for the
set case. As set do not support null values, we can basically take the
most common elements histogram to calculate sensitivities for overlap
and subset operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant