Perf: optimize postal_parse_address implementation#49
Perf: optimize postal_parse_address implementation#49lmores wants to merge 3 commits intoNickCrews:mainfrom
Conversation
|
@lmores here's what I see when I run those benchmarks for the following system specs ============================= test session starts ==============================
platform linux -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jimmy/Work/mismo
configfile: pyproject.toml
plugins: anyio-4.4.0, cov-5.0.0, benchmark-4.0.0
collected 24 items / 18 deselected / 6 selected
mismo/lib/geo/tests/test_postal_benchmark.py ...... [100%]
---------------------------------------------------------------------------------------------- benchmark 'nrows=1000000': 6 tests ---------------------------------------------------------------------------------------------
Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_parse[1m-noop] 12.8445 (1.0) 12.9714 (1.0) 12.9009 (1.0) 0.0570 (1.0) 12.9015 (1.0) 0.1046 (1.44) 1;0 0.0775 (1.0) 5 1
test_benchmark_parse[1m-python_only] 13.5212 (1.05) 13.7194 (1.06) 13.5844 (1.05) 0.0782 (1.37) 13.5659 (1.05) 0.0728 (1.0) 1;1 0.0736 (0.95) 5 1
test_benchmark_parse[1m-postal_only] 51.1747 (3.98) 51.4688 (3.97) 51.2672 (3.97) 0.1179 (2.07) 51.2128 (3.97) 0.1199 (1.65) 1;0 0.0195 (0.25) 5 1
test_benchmark_parse[1m-postal_parse_address] 56.0817 (4.37) 56.8488 (4.38) 56.2741 (4.36) 0.3264 (5.73) 56.1196 (4.35) 0.2928 (4.02) 1;1 0.0178 (0.23) 5 1
test_benchmark_parse[1m-postal_parse_address__direct_import] 56.2151 (4.38) 56.4120 (4.35) 56.3279 (4.37) 0.0975 (1.71) 56.3729 (4.37) 0.1837 (2.52) 2;0 0.0178 (0.23) 5 1
test_benchmark_parse[1m-postal_parse_address__initial_impl] 58.3636 (4.54) 58.5571 (4.51) 58.4671 (4.53) 0.0819 (1.44) 58.4389 (4.53) 0.1328 (1.82) 2;0 0.0171 (0.22) 5 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Legend:
Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
OPS: Operations Per Second, computed as 1 / Mean
================ 6 passed, 18 deselected in 1769.89s (0:29:29) =================Like you, I also see a significant improvement in the runtime performance when using pytest-benchmark |
|
Hmm, really curious about this one, I'll dig in once I'm at a computer. Thanks for the effort. First thought is all the timings seem cut in nearly exactly half, maybe were accidentally doing computation twice in the non-pytest harness? |
Quite sure this is not the case (also, my timings are a bit far from being halved). Differences may be due to the fact that in non-pytest benchmarks I was using DuckDB that writes to disk, and not in memory as in pytest-benchmark... however, this is what I get when running non-pytest benchmarks using a memtable (doesn't change much, still far from the others): At first I thought that pytest-benchmark library did something as simple as |
Intro
In this PR:
mismo.lib.geo.postal_parse_addresshas been rewritten using my implementationmismo.lib.geo.tests.test_postal_benchmark.postal_parse_address__direct_importis the same as the above one but avoids the optional import ofpostalinside its body (@NickCrews: don't worry, I was just curious if that import and the inner function declaration generated any relevant overhead... seems it is irrelevant, will remove it before merge)mismo.lib.geo.tests.test_postal_benchmark.postal_parse_address__initial_implis the original implementation by @jstammersIn this branch I also temporary restored the top-level
benchmarksdirectory to manually gather execution times.All benchmarks have been run on the full data set of 1M rows using
duckdb 0.10.0,ibis-framework 9.1.0andpython 3.12.3on the same laptop with no other programs running (except GNOME, etc).Results
Results obtained with manual timings are as follows.
On the other hand, pytest-benchmark reports the following.
Values in the Mean column are remarkably lower than those reported manually (which lay way outside the StdDev or the IQR).
Isn't it quite/too strange? As far as I can tell
pytest-benchmarkis running each function 5 times, are we caching something that we should not and subsequent executions run faster than just one (as in manual timings)?Also, in pytest benchmarks the difference between
postal_parse_addressandpostal_onlyis more than5swhich is much bigger than the one from manual timings (<1s).Additionally, before running the above results I recreated my virtualenv from zero to match mismo new requirements (
python>=3.10,ibis-framework>=9.1.0) and the results I get are quite different also from those I reported in PR #48 where I was usingibis-framework 9.0.0andduckdb 1.0.0(not sure about which version of python though).Questions