-
Notifications
You must be signed in to change notification settings - Fork 0
Parse dictionary output #228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #228 +/- ##
==========================================
+ Coverage 96.18% 96.20% +0.02%
==========================================
Files 8 8
Lines 1440 1450 +10
==========================================
+ Hits 1385 1395 +10
Misses 55 55 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I think the question is important but it's not really urgent, so I'm gonna make it draft. |
|
I'm a bit confused -- what is a "job" here? |
Maybe I should have called it |
Ah, ok. So do I understand correctly that you really want def my_workflow(some_inputs):
...
output_dict = get_output(job)
my_input = get_input_of_another_job(energy=output_dict["energy"])
...To be the right way to write things, regardless of whether |
From the programming point of view, I don't like it either. The problem comes rather from the reality: I'm having a hard time telling people to write output parsers with tuples for multiple outputs, because people usually use rather a dictionary or a data class and not a tuple. Since tuples and dictionaries are largely equivalent in this regard, I still think it is useful to allow the user to return a dictionary to mean multiple outputs. Alternatively, I'm also fine with not allowing multiple outputs at all, as @jan-janssen prefers. I find it a rather unintuitive constraint that only tuples are allowed for multiple outputs. |
I didn't think we wanted to disallow dictionaries as outputs? I find this totally fine: def get_output(job_instance):
...
return {"positions", x, "forces", f, "energy": E}
def my_workflow(some_inputs):
...
output_dict = get_output(job)
my_input = get_input_of_another_job(energy=output_dict["energy"])
...It's this that I am scared of: def get_output(job_instance):
...
return positions, forces, energy
def my_workflow(some_inputs):
...
output_dict = get_output(job)
my_input = get_input_of_another_job(energy=output_dict["energy"])
...I think maybe I see where the drive for this is coming from, since in import pyiron_workflow as pwf
@pwf.as_function_node
def get_output(job_instance):
...
return positions, forces, energy
@pwf.as_macro_node
def my_workflow(some_inputs):
...
output = get_output(job)
my_input = get_input_of_another_job(energy=output.outputs.energy)
...IMO this is fine for My understanding of the |
|
Changing tack a little bit, I do rather like the idea of an So I actually rather like that these changes to I.e. I like the power being introduced here, I'm just deeply unconvinced of the proposed plan for exploiting it in |
To this end, @samwaseda it may be helpful to me if there were some working example of your intent? The test suite extension here is useful, but doesn't touch the workflow recipe generation. I checked the branch out and tried things for myself: from semantikon.workflow import workflow
def outputs_dict(x, y):
return {"x": x, "y": y}
def outputs_tuple(i, j):
return i, j
def add(obj, other):
sum = obj + other
return sum
# @workflow
# def my_macro(x, y, i, j):
# out_dict = outputs_dict(x, y)
# out_i, out_j = outputs_tuple(i, j)
# use_it = add(out_dict["x"], out_i)
# return use_it
@workflow
def my_macro(x, y, i, j):
out_dict = outputs_dict(x, y)
out_i, out_j = outputs_tuple(i, j)
return out_dict, out_i, out_j
my_macro._semantikon_workflowSince the new functionality is not yet propagated to the workflows, on both main and this branch I get the entry 'outputs_dict_0': {'inputs': {'x': {}, 'y': {}},
'outputs': {'output': {}},
'function': <function __main__.outputs_dict(x, y)>,
'type': 'Function'}},I guess the idea here is that this should have From there, I would also be supportive of something like the commented-out workflow, where the tuple-based return and dict-based return both work but the syntax for leveraging them is different. However right now this unsurprisingly gives If it did work, I would imagine to see 'edges': [
...
('outputs_dict_0.outputs.x', 'add_0.inputs.obj'),
('outputs_tuple_0.outputs.i', 'add_0.inputs.other'),
...Now this is a little bit interesting, because it demands that WfMS interpret both tuples and dictionaries as independent outputs when parsing plain functions. I'm OK with this demand, and would like to support it in |
Multiple outputs per node is a beautiful thing that gives us tonnes of power. |
|
Ah ok now I understood the confusion, sorry :D. For def get_output(job_instance):
...
return positions, forces, energyyou would have to do def my_workflow(some_inputs):
...
positions, forces, energy = get_output(job)
my_input = get_input_of_another_job(energy=energy)
... |
Ahh, ok 😅 Then yes, I quite like the change. My note on our requirements stands, but it's something I think we should be aware of and not something that should stop us moving in this direction:
|
Yeah the more I think about it, the more I feel it's potentially a super dangerous step, because currently the number of outputs can be specified by the number of variables assigned, i.e.: def f(x, y):
return x, y
def my_workflow_1(x, y):
x, y = f(x, y)
return x, y
def my_workflow_2(x, y):
z = f(x, y)
return zSo in the first case there are two outputs, and in the second case one. But now if we use def f(x, y):
return {"x": x, "y": y}
def my_workflow(x, y):
d = f(x, y)
a = g(d["x"])
b = h(d)
...In this case both multiple outputs and single outputs must be simultaneously dealt with. I don't think such a case can be represented straightforwardly... |
Indeed, that is a very good point -- if the ast->workflow recipe parser is going to allow
Having let my own idea sit with me longer, I want to double-down on this: do whatever you want internally, but for the user interface you should only ever take or return dictionaries if they are of a uniform typing, e.g. So maybe we don't want to be encouraging this anyhow. |
I started feeling that we should give the possibility of returning
dictand data classes, simply because it's the de facto standard for input and output parsers.Before (this works just as well):
After:
And these two should have the same output keys. This would imply that we would allow something like this:
In this case the parsing will be slightly more complicated, but I guess it's still a reasonable amount of work. What do you think of the overall idea? @jan-janssen @liamhuber
Note: This PR would only parse
get_outputbutmy_workflowas given above would not work yet.