22
33# Table of Contents
441 . [ Overview] ( #Overview )
5- 2 . [ Why Not Protobufs ?] ( #Why-Not-Protobufs )
6- 3 . [ No Copy] ( #No-Copy )
7- 4 . [ Supported Types] ( #Supported-Types )
8- 5 . [ Inheritance] ( #Inheritance )
9- 6 . [ Msgpack] ( #Msgpack )
10- 7 . [ Namespaces] ( #Namespaces )
11- 8 . [ Generated Code] ( #Generated-Code )
5+ 2 . [ Getting Started] ( #Getting-Started )
6+ 3 . [ Why Not Protobufs ?] ( #Why-Not-Protobufs )
7+ 4 . [ No Copy] ( #No-Copy )
8+ 5 . [ Supported Types] ( #Supported-Types )
9+ 6 . [ Inheritance] ( #Inheritance )
10+ 7 . [ Msgpack] ( #Msgpack )
11+ 8 . [ Namespaces] ( #Namespaces )
12+ 9 . [ Generated Code] ( #Generated-Code )
13+ 10 . [ Other Languages] ( #Other-Languages )
1214
1315
1416## Overview
1517
16- This project has helpers for automatically generating C++ structs and corresponding pybind marshalling code for
17- dataclasses and pydantic-based classes.
18+ Python is the language of choice for finance, data science etc. Python calling C++ (and increasingly, Rust) is a
19+ common pattern, leveraging packages such as
20+ [ pybind11] ( https://pybind11.readthedocs.io/en/stable/index.html ) .
1821
19- This is achieved via a cmake rule: ` pydantic_bind_add_module(<path to module>) `
22+ A common problem is a how best to represent data to be shared between python and C++ code. One would like idiomatic
23+ representations in each language and this may be necessary to fully utilise certain python packages. E.g.,
24+ [ FastAPI] ( https://fastapi.tiangolo.com ) is a popular way to create REST services, using Open API definitions derived
25+ from [ pydantic] ( https://docs.pydantic.dev/latest/ ) classes. Therefore, a data model authored using pydantic classes,
26+ or native python dataclasses, from which sensible C++ structs and appropriate marshalling can automatically be
27+ generated, is desirable.
2028
21- Add a module this way and it will be scanned for:
22- - dataclasses
23- - classes derived from pydantic's BaseModel
24- - enums
29+ This package provides such tools: a cmake rule allows you to generate C++ structs (with msgpack serialisation) and
30+ corresponding pybind11 bindings.
2531
26- For any of these which are encountered, a definition will be added to a .h file, with re˚lative path matching the module
27- and [ pybind11] ( https://pybind11.readthedocs.io/en/stable/index.html ) code for binding objects added to a
28- corresponding .cpp file.
29-
30- The intended use of this package is for defining behaviour-less data classes, to be shared between python and C++. E.g.,
31- a common object model for financial modelling. Furthr, we want idiomatic classes for each language, not mutants like
32- Protobuf-generated python classes.
32+ Python functions allow you to naviagte between the C++ pybind11 objects and the native python objects. There is also an
33+ option for all python operations to be directed to an owned pybind11 object (see [ No Copy] ( #No-Copy ) ).
3334
3435Note that the typcal python developer experience is now somewhat changed, in that it's necessary to build/install
3536the project. I personally use JetBrains CLion, in place of PyCharm for such projects.
3637
37- For an example project please see (the rather nascent) [ fin-data-model] ( https://github.com/nickyoung-github/fin-data-model )
38+ For an example of the kind of behaviour-less object model this package is intended to help,
39+ please see (the rather nascent) [ fin-data-model] ( https://github.com/nickyoung-github/fin-data-model )
40+
41+
42+ ## Getting Started
43+
44+ ` pydantic_bind ` adds a custom cmake rule: ` pydantic_bind_add_package(<package path>) `
45+
46+ This rule will do the following:
47+ - scan for sub-packages
48+ - scan each sub-package for all .py files
49+ - add custom steps for generating .cpp/.h files from any of the following, encounted in the .py files:
50+ - dataclasses
51+ - classes derived from pydantic's BaseModel
52+ - enums
3853
39- You can create an instance of the pybind class from your original using ` get_pybind_instance() ` , e.g.,
54+ C++ directory and namespace structure will match the python package structure (see [ Namespaces] ( #Namespaces ) ).
55+
56+ You can create an instance of the pybind11 class from your original using ` get_pybind_instance() ` , e.g.,
4057
4158* my_class.py:*
4259
@@ -57,13 +74,13 @@ You can create an instance of the pybind class from your original using `get_pyb
5774 find_package(python3 REQUIRED COMPONENTS Interpreter Development)
5875 find_package(pydantic_bind REQUIRED COMPONENTS HINTS "${python3_SITELIB}")
5976
60- pydantic_bind_add_module(my_class.py )
77+ pydantic_bind_add_package(my_package )
6178
6279
6380* my_util.py*
6481
6582 from pydantic_bind import get_pybind_value
66- from my_class imnport MyClass
83+ from my_package. my_class imnport MyClass
6784
6885 orig = MyClass(my_int=123, my_string="hello")
6986 generated = get_pybind_value(orig)
@@ -73,8 +90,8 @@ You can create an instance of the pybind class from your original using `get_pyb
7390
7491## Why Not Protobufs?
7592
76- A very good question. Protobufs are frankly a PITA to use: they have poor to no variant support, the generated
77- code is ugly and idiosyncratic, they're large and painful to copy around etc.
93+ I personally find protobufs to be a PITA to use: they have poor to no variant support, the generated code is ugly and
94+ idiosyncratic, they're large and painful to copy around etc.
7895
7996AVRO is more friendly but generates python classes dynamically, which confuses IDEs like Pycharm. I do think a good
8097solution is something like [ pydantic_avro] ( https://github.com/godatadriven/pydantic-avro/tree/main/src/pydantic_avro )
@@ -93,12 +110,12 @@ than holding its own copy.
93110
94111Deriving from this ` BaseModel ` will give you equivalent functionality of as pydantic's ` BaseModel ` . The
95112annotations are re-written using ` computed_field ` , with property getters and setters operating on the generated pybind
96- class, which is instantiated behind the scenes in ` init ` . Note that this will make some operations (especially those
113+ class, which is instantiated behind the scenes in ` __init__ ` . Note that this will make some operations (especially those
97114that access __ dict__ ) less efficient. I've also plumbed the computed fields into the JSON schema, so these objects can
98115be used with [ FastAPI] ( https://fastapi.tiangolo.com ) .
99116
100117` dataclass ` works similarly, adding properties to the dataclass, so that the exisitng get and set functionality works
101- seamless in accessing the generated pybind class (also set via a shimmed ` __init__ ` ).
118+ seamless in accessing the generated pybind11 class (also set via a shimmed ` __init__ ` ).
102119
103120Using regular ` dataclass ` or ` BaseModel ` as members of classes defined with the pydantic_bind versions is very
104121inefficient and not recommended.
@@ -120,7 +137,8 @@ The following python -> C++ mappings are supported (there are likely others I sh
120137- pydantic_bind.BaseModel --> struct
121138- dataclass --> struct
122139- pydantic_bind.dataclass --> struct
123- - Enum --> enum
140+ - Enum --> enum class
141+
124142
125143## Inheritance
126144
@@ -141,17 +159,17 @@ project with my rather rudimentary cmake skillz!) Changes include:
141159- Support for enums
142160
143161A likely future enhancement will be to use [ cereal] ( https://github.com/USCiLab/cereal ) and add a mgspack adaptor.
144- However, I haven't quite worked out how to do that yet.
162+
163+ The no-copy python objects add ` to_msg_pack() ` and ` from_msg_pack() ` (the latter being a class method), to access
164+ this functionality.
145165
146166
147167## Namespaces
148168
149- Currently, the generated C++ code uses a single namespace, corresponding to the top-level package name in python.
150- I intend to introduce namespaces which match the python package structure. However, there are likely to be some
151- cmake-related foibles, such as not allowing duplicate module names, even if they are in different packages.
169+ Directory structure and namespaces in the generated C++ match the python package and module names.
152170
153- pybind modules are also generated per-module, rather than per-package. This is something I am considering changing,
154- but again, some cmake gymnastics will be required.
171+ cmake requires unique target names and pybind11 requires that the filename (minus the OS-speicific qualifiers) matches
172+ the module name.
155173
156174
157175## Generated Code
@@ -160,7 +178,7 @@ Code is generated into a directory structure underneath `<top level>/generated`.
160178
161179Headers are installed to ` <top level>/include ` .
162180
163- Compiled pybind modules are installed into ` <original module path>/__pybind__ ` .
181+ Compiled pybind11 modules are installed into ` <original module path>/__pybind__ ` .
164182
165183For C++ usage, you need only the headers, the compiled code is for pybind/python usage only.
166184
@@ -227,7 +245,7 @@ will generate the following files:
227245 #include <msgpack/msgpack.h>
228246 #include <chrono>
229247
230- namespace common_object_model
248+ namespace common_object_model::v1::common
231249 {
232250 enum Weekday { MONDAY = 1, TUESDAY = 2, WEDNESDAY = 3, THURSDAY = 4, FRIDAY = 5, SATURDAY = 6, SUNDAY = 7
233251 };
@@ -320,10 +338,10 @@ will generate the following files:
320338 #include "foo.h"
321339
322340 namespace py = pybind11;
323- using namespace common_object_model;
341+ using namespace common_object_model::v1::common ;
324342
325343
326- PYBIND11_MODULE(foo , m)
344+ PYBIND11_MODULE(common_object_model_v1_common_foo , m)
327345 {
328346 py::enum_<Weekday>(m, "Weekday").value("MONDAY", Weekday::MONDAY)
329347 .value("TUESDAY", Weekday::TUESDAY)
@@ -371,5 +389,7 @@ will generate the following files:
371389 }
372390
373391
392+ ## Other languages
374393
375-
394+ When time allows, I will look at adding support for Rust. There is limited value in generating Java or C# classes;
395+ calling those VM-based lanagues in-process from python has never worked well, in my experience.
0 commit comments