-
Notifications
You must be signed in to change notification settings - Fork 0
structure
I've thought for a few weeks on how to structure this whole library, getting inspiration from the pytorch and tinygrad repositories. At a high level, the actual package repository is in pyember/ember, which uses functions pybinded from pyember/aten for fast computations.
I tried to model a lot of the structure from Pytorch and TinyGrad. Very briefly,
-
aten/contains the header and source files for the C++ low-level tensor library, such as basic operations and an autograd engine.-
aten/srccontains all the source files and definitions. -
aten/bindingscontains the pybindings. -
aten/testcontains all the C++ testing modules for aten.
-
-
ember/contains the actual library, supporting high level models, objectives, optimizers, dataloaders, and samplers.-
ember/atencontains the stub files. -
ember/datasetscontains all preprocessing tools, such as datasets/loaders, standardizing, cross validation checks. -
ember/modelscontains all machine learning models. -
ember/objectivescontain all loss functions and regularizers. -
ember/optimizerscontain all the optimizers/solvers, such as iterative (e.g. SGD), greedy (e.g. decision tree splitting), and one-shot (e.g. least-squares solution). -
ember/samplerscontain all samplers (e.g. MCMC, SGLD).
-
-
examples/are example python scripts on training models. -
tests/are python testing modules for theemberlibrary. -
docker/contains docker images of all the operating systems and architectures I tested ember on. General workflows on setting up the environment can be found there for supported machines. -
setup.pyallows you to pip install this as a package. -
run_tests.shwhich is the main test running script.
For a more detailed explanation, look here.
Aten, short for "a tensor" library (got the name from PyTorch), is a C++ library that provides low level functionality for Tensors. This includes the basic vector and matrix operations like addition, scalar/matrix multiplication, dot products, transpose, etc, which are used everywhere in model training and inference and must be fast.
Let's look at aten/CMakeLists.txt and aten/binding/CMakeLists.txt.
-
aten/CMakeLists.txtcontains the instructions to generate a Makefile for compiling and linking theatenlibrary. It has an optional argumentBUILD_PYTHON_BINDINGSwhen setON, will generate the.sofile throughaten/binding/CMakeLists.txt. The executables compiled withaten/main.cppare compiled toaten/build/main. Same for the test files which are compiled toaten/build/tests. -
aten/binding/CMakeLists.txtcontains the instructions to generate the.sofile and saves it topyember/ember/_C.cpython-312-darwin.so. It must be contained within the Python package directory, sinceembercannot access libraries outside of its base directory.
hi