Replies: 5 comments 8 replies
-
|
I understand your concern. I think the thing is that what you're proposing is a big enough change that it would need us to redesign a lot of things unfortunately, and I don't feel like an overall refactor is what we are looking for at the moment. |
Beta Was this translation helpful? Give feedback.
-
I like the idea of referencing/including another dataflow file. I see two possible designs:
|
Beta Was this translation helpful? Give feedback.
-
This is something that we already discussed in the context of automated checks and schema specifications. Our plan is that each node specifies some schema information for its outputs to make the output types explicit. Ideally, this would be automatically verified against the actual executable in some way. By providing schema and output information, it will become easier to reuse existing nodes, e.g. from the node hub. For this, each node should be distributed with some node declaration file similar to the one you proposed here. Making the build/run/outputs field optional in the dataflow config file seems like the logical next step then. If the node already specifies these, there is no need to specify this info again. |
Beta Was this translation helpful? Give feedback.
-
I'm not sure if I understand the motivation for the proposed project file. To me, the dataflow specification file already acts as a project file. With proposal 1, we could already include other dataflows, so the "Dora application rely[ing] on another Dora application" would be possible. Then we would not need a dependencies section, would we? The |
Beta Was this translation helpful? Give feedback.
-
|
Alright, so
# the dataflow file
include:
- dataflow: <file path, url, or git path relative to current dataflow>
prefix: <optional prefix for the nodes in the dataflow>
# maybe some mechanism to passthrough the env variables
# the descriptor for the node
build: "<the build command>"
run: "the run command"
outputs:
- output-1
- output-2
# ...
# more fields for constraint and verifing in the future# the dataflow file
- id: node-1
desc: <the path, url or git of the node's descriptor>are enough for the requirements and much simpler than the original idea. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This discussion is inspired by the #1213, and added more detail about this requirement.
Obviously, current dataflow YAML describes the process of the whole Dora application crystal clear.
But as the Application grow larger, the YAML file will expand and will be no long readable in the end.
Of course, there is a
dataflow-builderfor constructing dataflow with programming languages. But I think it's still limited and not intuitive.outputsfield redundant?Comparing with traditional package paradigm, find that we need to declare the outputs replicated each time we use the same node.
Could we just place some of the declaration of Node in a standalone file?
This is an extension of the 1. and 2.
Current nodes can from variant sources, local path, url, and git. Imaging a Dora application rely on another Dora application, there will be many replicated characters for navigating to nodes in the dependent application. Introducing a new
Project(or Workspace)unit seems can mitigate it.A draft idea about a project descriptor
and in dataflow,
The exact content of the
namefield is resolved from the project descriptor file in the ancestors directories of the current working directory or user customized project descriptor file.(The project descriptor file is just like the Cargo.toml for Rust, to some degree)
An auto-completion while writing nodes' inputs field is possible, because the outputs of the node have been well-defined in the project descriptor.
Beta Was this translation helpful? Give feedback.
All reactions