Skip to content

Commit 1b2cc9f

Browse files
Merge pull request #15 from nickyoung-github/namespaces
Namespaces
2 parents 427cefc + ae9f7d9 commit 1b2cc9f

File tree

4 files changed

+207
-129
lines changed

4 files changed

+207
-129
lines changed

README.md

Lines changed: 61 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -2,41 +2,58 @@
22

33
# Table of Contents
44
1. [Overview](#Overview)
5-
2. [Why Not Protobufs ?](#Why-Not-Protobufs)
6-
3. [No Copy](#No-Copy)
7-
4. [Supported Types](#Supported-Types)
8-
5. [Inheritance](#Inheritance)
9-
6. [Msgpack](#Msgpack)
10-
7. [Namespaces](#Namespaces)
11-
8. [Generated Code](#Generated-Code)
5+
2. [Getting Started](#Getting-Started)
6+
3. [Why Not Protobufs ?](#Why-Not-Protobufs)
7+
4. [No Copy](#No-Copy)
8+
5. [Supported Types](#Supported-Types)
9+
6. [Inheritance](#Inheritance)
10+
7. [Msgpack](#Msgpack)
11+
8. [Namespaces](#Namespaces)
12+
9. [Generated Code](#Generated-Code)
13+
10. [Other Languages](#Other-Languages)
1214

1315

1416
## Overview
1517

16-
This project has helpers for automatically generating C++ structs and corresponding pybind marshalling code for
17-
dataclasses and pydantic-based classes.
18+
Python is the language of choice for finance, data science etc. Python calling C++ (and increasingly, Rust) is a
19+
common pattern, leveraging packages such as
20+
[pybind11](https://pybind11.readthedocs.io/en/stable/index.html) .
1821

19-
This is achieved via a cmake rule: `pydantic_bind_add_module(<path to module>)`
22+
A common problem is a how best to represent data to be shared between python and C++ code. One would like idiomatic
23+
representations in each language and this may be necessary to fully utilise certain python packages. E.g.,
24+
[FastAPI](https://fastapi.tiangolo.com) is a popular way to create REST services, using Open API definitions derived
25+
from [pydantic](https://docs.pydantic.dev/latest/) classes. Therefore, a data model authored using pydantic classes,
26+
or native python dataclasses, from which sensible C++ structs and appropriate marshalling can automatically be
27+
generated, is desirable.
2028

21-
Add a module this way and it will be scanned for:
22-
- dataclasses
23-
- classes derived from pydantic's BaseModel
24-
- enums
29+
This package provides such tools: a cmake rule allows you to generate C++ structs (with msgpack serialisation) and
30+
corresponding pybind11 bindings.
2531

26-
For any of these which are encountered, a definition will be added to a .h file, with re˚lative path matching the module
27-
and [pybind11](https://pybind11.readthedocs.io/en/stable/index.html) code for binding objects added to a
28-
corresponding .cpp file.
29-
30-
The intended use of this package is for defining behaviour-less data classes, to be shared between python and C++. E.g.,
31-
a common object model for financial modelling. Furthr, we want idiomatic classes for each language, not mutants like
32-
Protobuf-generated python classes.
32+
Python functions allow you to naviagte between the C++ pybind11 objects and the native python objects. There is also an
33+
option for all python operations to be directed to an owned pybind11 object (see [No Copy](#No-Copy)).
3334

3435
Note that the typcal python developer experience is now somewhat changed, in that it's necessary to build/install
3536
the project. I personally use JetBrains CLion, in place of PyCharm for such projects.
3637

37-
For an example project please see (the rather nascent) [fin-data-model](https://github.com/nickyoung-github/fin-data-model)
38+
For an example of the kind of behaviour-less object model this package is intended to help,
39+
please see (the rather nascent) [fin-data-model](https://github.com/nickyoung-github/fin-data-model)
40+
41+
42+
## Getting Started
43+
44+
`pydantic_bind` adds a custom cmake rule: `pydantic_bind_add_package(<package path>)`
45+
46+
This rule will do the following:
47+
- scan for sub-packages
48+
- scan each sub-package for all .py files
49+
- add custom steps for generating .cpp/.h files from any of the following, encounted in the .py files:
50+
- dataclasses
51+
- classes derived from pydantic's BaseModel
52+
- enums
3853

39-
You can create an instance of the pybind class from your original using `get_pybind_instance()`, e.g.,
54+
C++ directory and namespace structure will match the python package structure (see [Namespaces](#Namespaces)).
55+
56+
You can create an instance of the pybind11 class from your original using `get_pybind_instance()`, e.g.,
4057

4158
*my_class.py:*
4259

@@ -57,13 +74,13 @@ You can create an instance of the pybind class from your original using `get_pyb
5774
find_package(python3 REQUIRED COMPONENTS Interpreter Development)
5875
find_package(pydantic_bind REQUIRED COMPONENTS HINTS "${python3_SITELIB}")
5976

60-
pydantic_bind_add_module(my_class.py)
77+
pydantic_bind_add_package(my_package)
6178

6279

6380
*my_util.py*
6481

6582
from pydantic_bind import get_pybind_value
66-
from my_class imnport MyClass
83+
from my_package.my_class imnport MyClass
6784

6885
orig = MyClass(my_int=123, my_string="hello")
6986
generated = get_pybind_value(orig)
@@ -73,8 +90,8 @@ You can create an instance of the pybind class from your original using `get_pyb
7390

7491
## Why Not Protobufs?
7592

76-
A very good question. Protobufs are frankly a PITA to use: they have poor to no variant support, the generated
77-
code is ugly and idiosyncratic, they're large and painful to copy around etc.
93+
I personally find protobufs to be a PITA to use: they have poor to no variant support, the generated code is ugly and
94+
idiosyncratic, they're large and painful to copy around etc.
7895

7996
AVRO is more friendly but generates python classes dynamically, which confuses IDEs like Pycharm. I do think a good
8097
solution is something like [pydantic_avro](https://github.com/godatadriven/pydantic-avro/tree/main/src/pydantic_avro)
@@ -93,12 +110,12 @@ than holding its own copy.
93110

94111
Deriving from this `BaseModel` will give you equivalent functionality of as pydantic's `BaseModel`. The
95112
annotations are re-written using `computed_field`, with property getters and setters operating on the generated pybind
96-
class, which is instantiated behind the scenes in `init`. Note that this will make some operations (especially those
113+
class, which is instantiated behind the scenes in `__init__`. Note that this will make some operations (especially those
97114
that access __dict__) less efficient. I've also plumbed the computed fields into the JSON schema, so these objects can
98115
be used with [FastAPI](https://fastapi.tiangolo.com).
99116

100117
`dataclass` works similarly, adding properties to the dataclass, so that the exisitng get and set functionality works
101-
seamless in accessing the generated pybind class (also set via a shimmed `__init__`).
118+
seamless in accessing the generated pybind11 class (also set via a shimmed `__init__`).
102119

103120
Using regular `dataclass` or `BaseModel` as members of classes defined with the pydantic_bind versions is very
104121
inefficient and not recommended.
@@ -120,7 +137,8 @@ The following python -> C++ mappings are supported (there are likely others I sh
120137
- pydantic_bind.BaseModel --> struct
121138
- dataclass --> struct
122139
- pydantic_bind.dataclass --> struct
123-
- Enum --> enum
140+
- Enum --> enum class
141+
124142

125143
## Inheritance
126144

@@ -141,17 +159,17 @@ project with my rather rudimentary cmake skillz!) Changes include:
141159
- Support for enums
142160

143161
A likely future enhancement will be to use [cereal](https://github.com/USCiLab/cereal) and add a mgspack adaptor.
144-
However, I haven't quite worked out how to do that yet.
162+
163+
The no-copy python objects add `to_msg_pack()` and `from_msg_pack()` (the latter being a class method), to access
164+
this functionality.
145165

146166

147167
## Namespaces
148168

149-
Currently, the generated C++ code uses a single namespace, corresponding to the top-level package name in python.
150-
I intend to introduce namespaces which match the python package structure. However, there are likely to be some
151-
cmake-related foibles, such as not allowing duplicate module names, even if they are in different packages.
169+
Directory structure and namespaces in the generated C++ match the python package and module names.
152170

153-
pybind modules are also generated per-module, rather than per-package. This is something I am considering changing,
154-
but again, some cmake gymnastics will be required.
171+
cmake requires unique target names and pybind11 requires that the filename (minus the OS-speicific qualifiers) matches
172+
the module name.
155173

156174

157175
## Generated Code
@@ -160,7 +178,7 @@ Code is generated into a directory structure underneath `<top level>/generated`.
160178

161179
Headers are installed to `<top level>/include`.
162180

163-
Compiled pybind modules are installed into `<original module path>/__pybind__`.
181+
Compiled pybind11 modules are installed into `<original module path>/__pybind__`.
164182

165183
For C++ usage, you need only the headers, the compiled code is for pybind/python usage only.
166184

@@ -227,7 +245,7 @@ will generate the following files:
227245
#include <msgpack/msgpack.h>
228246
#include <chrono>
229247

230-
namespace common_object_model
248+
namespace common_object_model::v1::common
231249
{
232250
enum Weekday { MONDAY = 1, TUESDAY = 2, WEDNESDAY = 3, THURSDAY = 4, FRIDAY = 5, SATURDAY = 6, SUNDAY = 7
233251
};
@@ -320,10 +338,10 @@ will generate the following files:
320338
#include "foo.h"
321339

322340
namespace py = pybind11;
323-
using namespace common_object_model;
341+
using namespace common_object_model::v1::common;
324342

325343

326-
PYBIND11_MODULE(foo, m)
344+
PYBIND11_MODULE(common_object_model_v1_common_foo, m)
327345
{
328346
py::enum_<Weekday>(m, "Weekday").value("MONDAY", Weekday::MONDAY)
329347
.value("TUESDAY", Weekday::TUESDAY)
@@ -371,5 +389,7 @@ will generate the following files:
371389
}
372390

373391

392+
## Other languages
374393

375-
394+
When time allows, I will look at adding support for Rust. There is limited value in generating Java or C# classes;
395+
calling those VM-based lanagues in-process from python has never worked well, in my experience.

pydantic_bind/base.py

Lines changed: 14 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
from enum import Enum, EnumType
33
from functools import cache, wraps
44
from importlib import import_module
5+
from itertools import chain
56
from pydantic import BaseModel as PydanticBaseModel, ConfigDict, computed_field
67
from pydantic.fields import ComputedFieldInfo, FieldInfo
78
from pydantic.json_schema import GenerateJsonSchema
@@ -42,9 +43,9 @@ def get_pybind_type(typ: Union[Enum, ModelMetaclass]) -> Union[EnumType, Type]:
4243
:return: The corresponding, generated pybind type
4344
"""
4445

45-
module_parts = typ.__module__.split(".")
46-
module_parts.insert(-1, "__pybind__")
47-
pybind_module = ".".join(module_parts)
46+
package_parts = typ.__module__.split(".")
47+
pybind_module_name = "_".join(package_parts)
48+
pybind_module = ".".join(chain(package_parts[:-1], ["__pybind__", pybind_module_name]))
4849

4950
module = sys.modules.get(pybind_module)
5051
if not module:
@@ -60,21 +61,21 @@ def get_pybind_value(obj):
6061
:param obj: A dataclass or Pydantic BaseModel-derived object
6162
:return: The corresponding pybind object
6263
"""
63-
return _get_pybind_value(obj, type(obj), False)
64+
return _get_pybind_value(obj, False)
6465

6566

66-
def _get_pybind_value(obj, typ: Union[Type, EnumType], default_to_self: bool = True):
67-
if issubclass(typ, EnumType):
68-
name = obj if isinstance(obj, str) else obj.name
69-
return get_pybind_type(type(obj)).__entries[name][0]
67+
def _get_pybind_value(obj, default_to_self: bool = True):
68+
if isinstance(obj, Enum):
69+
return get_pybind_type(type(obj)).__entries[obj.name][0]
7070
elif is_dataclass(obj) or isinstance(obj, PydanticBaseModel):
71+
typ = type(obj)
7172
pybind_type = get_pybind_type(typ)
7273
name_iter = (name for name, _, _ in field_info_iter(typ))
7374

7475
if hasattr(typ, "__has_pybind_impl__"):
7576
return pybind_type(**{name: getattr(obj.pybind_impl, name) for name in name_iter})
7677
else:
77-
return pybind_type(**{name: _get_pybind_value(getattr(obj, name), typ) for name in name_iter})
78+
return pybind_type(**{name: _get_pybind_value(getattr(obj, name)) for name in name_iter})
7879
elif default_to_self:
7980
return obj
8081
else:
@@ -148,7 +149,7 @@ def fn(self):
148149

149150
def _setter(name: str, typ: Union[EnumType, Type]):
150151
def fn(self, value: Any):
151-
setattr(self.pybind_impl, name, _get_pybind_value(value, typ))
152+
setattr(self.pybind_impl, name, _get_pybind_value(value))
152153

153154
fn.__name__ = name
154155
fn.__annotations__ = {"value": typ}
@@ -172,7 +173,7 @@ def __new__(
172173

173174
if annotations:
174175
# Rewrite annotations as properties, with getters and setters which interact with the attributes
175-
# on the generated pybind class
176+
# on the generated pybind_impl class
176177

177178
properties = {}
178179

@@ -257,12 +258,6 @@ def __init__(self, **kwargs):
257258
if __pybind_impl__:
258259
self.__pybind_impl = __pybind_impl__
259260
else:
260-
# This replicates some of what happens in the pydantic code:
261-
# 1. Convert values according to the alias generator
262-
# 2. Report missing required values
263-
#
264-
# Additionally, we convert values to pybind equivalents, where required (enums, for example)
265-
266261
missing_required = []
267262

268263
for name, field_info in self.model_computed_fields.items():
@@ -279,21 +274,19 @@ def __init__(self, **kwargs):
279274
kwargs[field_info.alias] = value
280275

281276
if value != PydanticUndefined:
282-
kwargs[name] = _get_pybind_value(value, field_info.return_type)
277+
kwargs[name] = _get_pybind_value(value)
283278

284279
if missing_required:
285280
raise RuntimeError(f"Missing required fields: {missing_required}")
286281

287-
# Now initialise the corresponding pybind type with the converted values ...
288-
289282
pybind_type = get_pybind_type(type(self))
290283
object.__setattr__(self, "_BaseModel__pybind_impl", pybind_type(**kwargs))
291284

292285
super().__init__()
293286

287+
294288
@property
295289
def __dict__(self):
296-
# This is not super efficient, but does ensure that __eq__, __hash__ work, using the pydantic implementations
297290
return {name: from_pybind_value(getattr(self, name), typ) for name, typ, _ in field_info_iter(type(self))}
298291

299292
@__dict__.setter

0 commit comments

Comments
 (0)