This project aims to create a flexible and configurable machine learning testbed platform. The platform will allow users to train, validate, and test machine learning models against various components. This version of the project description outlines a plan for building the platform incrementally, starting with core functionalities and progressively adding more advanced features based on a revised set of user stories.
The primary goal is to develop a robust and extensible platform for ML experimentation. This will be achieved by implementing functionalities step-by-step, ensuring each feature is well-tested and integrated before moving to the next.
The project will be built by implementing user stories sequentially. Each user story represents a small, achievable step towards the overall platform functionality. This approach allows for continuous progress, easier debugging, and clearer understanding of dependencies between features.
- Basic Data Loading: Start with loading simple, built-in datasets.
- Simple Model Definition: Define and train a basic model.
- Core Training Loop: Implement a fundamental training and evaluation process.
- Basic Metrics: Calculate and report essential metrics like loss and accuracy.
- Configuration: Introduce a basic configuration mechanism (e.g., using YAML).
- Extending Functionality: Gradually add support for more complex datasets, models, optimizers, metrics, and training features as defined in the user stories.
- Experiment Tracking: Integrate with tools like Weights & Biases.
- Programming Language: Python
- Machine Learning Framework: PyTorch
- Dependency Management: Poetry
- Experiment Tracking: Weights & Biases (wandb)
- Configuration: YAML files