Skip to content

feat(dataset): dataset as config pilot code#47

Draft
emptymalei wants to merge 7 commits intomainfrom
feature/lm/dataset-as-config
Draft

feat(dataset): dataset as config pilot code#47
emptymalei wants to merge 7 commits intomainfrom
feature/lm/dataset-as-config

Conversation

@emptymalei
Copy link
Member

@emptymalei emptymalei commented Mar 28, 2024

Resolves #46
Depends on #75

A pilot study of dataset as configs.

Background

#15
#46

Experiment

In this PR, we implemented a small example of how to define dataset using a yaml file. In this example, we provided a yaml file datasets/minimal.yaml,

version: 0.1.0
models:
brownian_motion:
definition:
system:
sigma: 1
delta_t: 1
initial_condition:
x0: 0
args:
n_steps: 100

When we run a command

poetry run hamilflow gen datasets/miminal.yaml

we can save the dataset in a specified location.

A few things are ignored in this example:

  • This implementation ignores the saving part as they will be trivial.
  • Version check. In principle, we should check if the version of the package matches the config.

How to Improve this Minimal Example

  1. Yaml can hold custom data with a prefix ! in the keys. We can implement a better way to define a model using customized data.
  2. This is a CLI example. Same can be down for the Python interface. Something like dataset = genenrate_dataset(path_to_config)
  3. We should have a meaningful validation before generating the data. In the validation process, we validate if the config works, and spit out meaningful error messages, e.g., version of the package doesn't match the version specified of the config.

A few questions

  1. Will this work for all models?
  2. Is it worth the hassle at all?

@github-actions

This comment was marked as off-topic.

@github-actions

This comment was marked as off-topic.

This was referenced Mar 28, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2024

PR Preview Action v1.4.7
🚀 Deployed preview to https://kausalflow.github.io/hamilflow/pr-preview/pr-47/
on branch gh-pages at 2024-06-02 19:44 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset as Configs

1 participant