Skip to content

Commit 4f3bcde

Browse files
committed
Add doc on picking resolvers
Also bump cache up: on `bench` the `basic` resolver high water marks as: - 40MB with no cache, averaging 455µs/line - 40.7MB with a 200 entries s3fifo, averaging 324µs/line - 42.4MB with a 2000 entries s3fifo, averaging 191µs/line - 44.2MB with a 5000 entries s3fifo, averaging 155µs/line - 47.2MB with a 10000 entries s3fifo, averaging 134µs/line - 53MB with a 2000 entries s3fifo, averaging 123µs/line Either 2000 or 5000 seem like pretty good defaults, the gains taper afterwards as memory use increases sharply. Bump to 2000 to stay on the conservative side.
1 parent 4e07493 commit 4f3bcde

File tree

5 files changed

+126
-7
lines changed

5 files changed

+126
-7
lines changed

README.rst

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,17 +30,20 @@ Just add ``ua-parser`` to your project's dependencies, or run
3030
3131
to install in the current environment.
3232

33-
Installing `google-re2 <https://pypi.org/project/google-re2/>`_ is
34-
*strongly* recommended as it leads to *significantly* better
35-
performances. This can be done directly via the ``re2`` optional
36-
dependency:
33+
Installing `ua-parser-rs <https://pypi.org/project/ua-parser-rs>`_ or
34+
`google-re2 <https://pypi.org/project/google-re2/>`_ is *strongly*
35+
recommended as they yield *significantly* better performances. This
36+
can be done directly via the ``regex`` and ``re2`` optional
37+
dependencies respectively:
3738

3839
.. code-block:: sh
3940
41+
$ pip install 'ua_parser[regex]'
4042
$ pip install 'ua_parser[re2]'
4143
42-
If ``re2`` is available, ``ua-parser`` will simply use it by default
43-
instead of the pure-python resolver.
44+
If either dependency is already available (e.g. because the software
45+
makes use of re2 for other reasons) ``ua-parser`` will use the
46+
corresponding resolver automatically.
4447

4548
Quick Start
4649
-----------

doc/api.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,19 @@ from user agent strings.
7575

7676
.. warning:: Only available if |re2|_ is installed.
7777

78+
.. class::ua_parser.regex.Resolver(Matchers)
79+
80+
An advanced resolver based on |regex|_ and a bespoke implementation
81+
of regex prefiltering, by the sibling project `ua-rust
82+
<https://github.com/ua-parser/uap-rust`_.
83+
84+
Sufficiently fast that a cache may not be necessary, and may even
85+
be detrimental at smaller cache sizes
86+
87+
.. warning:: Only available if `ua-parser-rs
88+
<https://pypi.org/project/ua-parser-rs/`>_ is
89+
installed.
90+
7891
Eager Matchers
7992
''''''''''''''
8093

doc/guides.rst

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,103 @@ from here on::
129129
:class:`~ua_parser.caching.Local`, which is also caching-related,
130130
and serves to use thread-local caches rather than a shared cache.
131131

132+
Builtin Resolvers
133+
=================
134+
135+
.. list-table::
136+
:header-rows: 1
137+
:stub-columns: 1
138+
139+
* -
140+
- speed
141+
- portability
142+
- memory use
143+
- safety
144+
* - ``regex``
145+
- great
146+
- good
147+
- bad
148+
- great
149+
* - ``re2``
150+
- good
151+
- bad
152+
- good
153+
- good
154+
* - ``basic``
155+
- terrible
156+
- great
157+
- great
158+
- great
159+
160+
``regex``
161+
---------
162+
163+
The ``regex`` resolver is a bespoke effort as part of the `uap-rust
164+
<https://github.com/ua-parser/uap-rust>`_ sibling project, built on
165+
`rust-regex <https://github.com/rust-lang/regex>`_ and `a bespoke
166+
regex-prefiltering implementation
167+
<https://github.com/ua-parser/uap-rust/tree/main/regex-filtered>`_,
168+
it:
169+
170+
- Is the fastest available resolver, usually edging out ``re2`` by a
171+
significant margin (when that is even available).
172+
- Is fully controlled by the project, and thus can be built for all
173+
interpreters and platforms supported by pyo3 (currently: cpython,
174+
pypy, and graalpy, on linux, macos and linux, intel and arm). It is
175+
also built as a cpython abi3 wheel and should thus suffer from no
176+
compatibility issues with new release.
177+
- Built entirely out of safe rust code, its safety risks are entirely
178+
in ``regex`` and ``pyo3``.
179+
- Its biggest drawback is that it is a lot more memory intensive than
180+
the other resolvers, because ``regex`` tends to trade memory for
181+
speed (~155MB high water mark on a real-world dataset).
182+
183+
If available, it is the default resolver, without a cache.
184+
185+
``re2``
186+
-------
187+
188+
The ``re2`` resolver is built atop the widely used `google-re2
189+
<https://github.com/google/re2>`_ via its built-in Python bindings.
190+
It:
191+
192+
- Is extremely fast, though around 80% slower than ``regex`` on
193+
real-world data.
194+
- Is only compatible with CPython, and uses pure API wheels, so needs
195+
a different release for each cpython version, for each OS, for each
196+
architecture.
197+
- Is built entirely in C++, but by experienced Google developers.
198+
- Is more memory intensive than the pure-python ``basic`` resolver,
199+
but quite slim all things considered (~55MB high water mark on a
200+
real-world dataset).
201+
202+
If available, it is the second-preferred resolver, without a cache.
203+
204+
``basic``
205+
---------
206+
207+
The ``basic`` resolver is a naive linear traversal of all rules, using
208+
the standard library's ``re``. It:
209+
210+
- Is *extremely* slow, about 10x slower than ``re2`` in cpython, and
211+
pypy and graal's regex implementations do *not* like the workload
212+
and behind cpython by a factor of 3~4.
213+
- Has perfect compatibility, with the caveat above, by virtue of being
214+
built entirely out of standard library code.
215+
- Is basically as safe as Python software can be by virtue of being
216+
just Python, with the native code being the standard library's.
217+
- Is the slimmest resolver at about 40MB.
218+
219+
This is caveated by a hard requirement to use caches which makes it
220+
workably faster on real-world datasets (if still nowhere near
221+
*uncached* ``re2`` or ``regex``) but increases its memory requirement
222+
significantly e.g. using "sieve" and a cache size of 20000 on a
223+
real-world dataset, it is about 4x slower than ``re2`` for about the
224+
same memory requirements.
225+
226+
It is the fallback and least preferred resolver, with a medium
227+
(currently 2000 entries) cache by default.
228+
132229
Writing Custom Resolvers
133230
========================
134231

doc/installation.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,9 @@ if installed, but can also be installed via and alongside ua-parser:
3535
$ pip install 'ua-parser[yaml]'
3636
$ pip install 'ua-parser[regex,yaml]'
3737
38+
``yaml`` simply enables the ability to :func:`load yaml rulesets
39+
<ua_parser.loaders.load_yaml>`.
40+
41+
The other two dependencies enable more efficient resolvers. By
42+
default, ``ua-parser`` will select the fastest resolver it finds out
43+
of the available set. For more, see :ref:`builtin resolvers`.

src/ua_parser/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@
7272
(
7373
RegexResolver,
7474
Re2Resolver,
75-
lambda m: CachingResolver(BasicResolver(m), Cache(200)),
75+
lambda m: CachingResolver(BasicResolver(m), Cache(2000)),
7676
),
7777
)
7878
)

0 commit comments

Comments
 (0)