-
Notifications
You must be signed in to change notification settings - Fork 3
dcpy
dcpy is our internal python package. While we have longer-term hopes to make this a publicly available package, for now it is our product-agnostic code used for a variety of purposes. It contains various submodules
We strive for type-safety in our python code, and an important step in this is creating classes/objects in python that represent discrete practical entities. Further, we often want to define these entities in structured yaml or json. For this reason, we rely heavily on pydantic. Pydantic's main stated purpose is data validation in python - ability to read in data from json or yaml, validate it at time of parsing, and then (assuming data was valid) provide type-safe objects to use in code. These classes can also have attributes specified, making it easy to go from a json definition of a dataset in edm-recipes ({"name": "bpl_libraries", "version": "20240609"}) to an S3 key ("datasets/bpl_libraries/20240609/bpl_libraries.parquet").
We have models organized by domain - parts of the product lifecycle (plan, builds, packaging), data definitions for connectors to external apis, or just more purely by conceptual domain (geospatial). Some of the layout of models then is a recreation of the folder structure of the rest of dcpy, while some does not map 1-1.
Outside of class/object methods, no code should live in models. Models is meant to be one of the more "base" submodules of dcpy - it should not depend on any other submodule, and with this design, the various submodules of dcpy can have knowledge of all of the defined entities that exist within dcpy (in models) without any circular references.
These are meant to be relatively pure utilities