Add __getitem__ and __getattr__ 🧙magic🧙♀️ methods for Job#450
Add __getitem__ and __getattr__ 🧙magic🧙♀️ methods for Job#450Andrew-S-Rosen wants to merge 19 commits intomaterialsproject:mainfrom
__getitem__ and __getattr__ 🧙magic🧙♀️ methods for Job#450Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #450 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 20 20
Lines 1521 1523 +2
Branches 419 419
=========================================
+ Hits 1521 1523 +2
|
__getitem__ magic method for Job__getitem__ 🧙magic🧙 method for Job
__getitem__ 🧙magic🧙 method for Job__getitem__ 🧙magic🧙♀️ method for Job
|
This is a very interesting idea, thanks! I have a couple of comments: Incompatibility with output attributesOne potential confusion is that you aren't able to do something like: job2 = capitalize(job1.hello)We could add that functionality in, but it then raises an issue with A potential fix would be to rename all of the Job attributes and functions to Incompatibility with jobs without a dictionary outputAnother issue is with jobs that output a python primitive directly. E.g., your @job
def capitalize(s):
return s.upper()
@job
def decapitalize(s):
return s.lower()This change might think users than can do: job1 = capitalize("hello")
job2 = decapitalize(job1)
flow = Flow([job1, job2])But that will fail since jobs can only accept To enable this behaviour we could make it so that if a I think supporting both attributes and whole input jobs would actually be a really nice feature and make jobflow easier to use. FlowsFinally, just a note that we could also do something similar with Flows. E.g., this would enable you to write: @job
def add(a, b):
return a + b
add_first = add(1, 5)
add_second = add(add_first, 5)
flow = Flow([add_first, add_second], output=add_second)
add_third = add(flow, 5)Together with #416 we would have a very nice API, e.g. @job
def add(a, b):
return a + b
@flow
def multi_add(a, b):
add_first = add(1, 5)
return add(add_first, 5)
result = add(multi_add(1, 5), 5) |
|
@utf. Thanks for expanding on my initial idea to transform this into a great discussion about the API as a whole. I think these are all fantastic. In particular
This was a feature I was actually hoping already existed a few weeks back --- I tried it out, and of course it wasn't the case. But I think this would be elegant because I can't really see many use cases for people passing On a practical note, I am happy to get help on this from anyone reading this message! I probably can't tackle your suggestions imminently, but it's important enough in my opinion that it will be high up on my to-do list. |
|
@Andrew-S-Rosen Thanks for tagging me. These sound like great UX improvements! |
|
FWIW: Going to try to tackle more of @utf's ideas next week! |
__getitem__ 🧙magic🧙♀️ method for Job__getitem__ and __getattr__ 🧙magic🧙♀️ method for Job
__getitem__ and __getattr__ 🧙magic🧙♀️ method for Job__getitem__ and __getattr__ 🧙magic🧙♀️ method2 for Job
__getitem__ and __getattr__ 🧙magic🧙♀️ method2 for Job__getitem__ and __getattr__ 🧙magic🧙♀️ methods for Job
|
@utf: Alright, we are getting there! First, a summary and then a question for you. SummaryI have now added a I have also made it so that if a QuestionsQuestion 1I have the following test: from jobflow import Flow, job, run_locally
from dataclasses import dataclass
@job
def make_str(s):
@dataclass
class MyClass:
hello: str = s
return MyClass
@job
def capitalize(s):
return s.upper()
job1 = make_str("world")
job2 = capitalize(job1.hello)
flow = Flow([job1, job2])
responses = run_locally(flow, ensure_success=True)
assert responses[job2.uuid][1].output == "WORLD"The traceback is as follows: Traceback (most recent call last):
File "C:\Users\asros\github\jobflow\src\jobflow\managers\local.py", line 101, in _run_job
response = job.run(store=store)
File "C:\Users\asros\github\jobflow\src\jobflow\core\job.py", line 605, in run
# if Job was created using the job decorator, then access the original function
File "C:\Users\asros\github\jobflow\src\jobflow\core\job.py", line 711, in resolve_args
store,
File "C:\Users\asros\github\jobflow\src\jobflow\core\reference.py", line 451, in find_and_resolve_references
resolved_references = resolve_references(
File "C:\Users\asros\github\jobflow\src\jobflow\core\reference.py", line 346, in resolve_references
resolved_references[ref] = ref.resolve(
File "C:\Users\asros\github\jobflow\src\jobflow\core\reference.py", line 178, in resolve
data = data[attr] if attr_type == "i" else getattr(data, attr)
AttributeError: 'dict' object has no attribute 'hello'
2023-10-12 14:45:24,767 INFO Finished executing jobs locallyTo my untrained eye, it seems that the output is being serialized into a dictionary format automatically (MSONable-related?) rather than staying a class. Do you have any suggestions? Question 2There is a new CommentYou mentioned earlier that this could result in clashes with "first-class" Todo
|
|
Awesome, thanks @Andrew-S-Rosen. For question 1, I think this is because the class is defined in the scope of from jobflow import Flow, job, run_locally
from dataclasses import dataclass
@dataclass
class MyClass:
hello: str
@job
def make_str(s):
return MyClass(hello=s)
@job
def capitalize(s):
return s.upper()
job1 = make_str("world")
job2 = capitalize(job1.hello)
flow = Flow([job1, job2])
responses = run_locally(flow, ensure_success=True)
assert responses[job2.uuid][1].output == "WORLD" |
|
Hello everyone, I think this is definitely a very nice addition! I have one question. This is related to the "real" attributes of the Job object, also raised by @utf with a potential solution with the renaming to _ or something, which hopefully does not clash... (someone could still define an output with e.g. output_ as a key, even it would be rather weird). I am just wondering if anyone thought of subclassing the Job (and if it makes sense ? maybe it wouldn't even work ?). If that is the case, also the attributes of these subclasses should be appended with _ (or any other modification to avoid clash with the Anyway, very nice thinking to improve user friendliness! |
|
Looks like the issue here is def __getattr__(self, name: str) -> OutputReference:
if attr := getattr(self.output, name, None):
return attr
raise AttributeError(f"{type(self).__name__} has no attribute {name!r}") |
I was gonna start writing something about this and then I read your comment @utf . Indeed, it would really be nice to be able to do that as well, otherwise we are stuck with a difference of behavior between jobs and flows, i.e. you can use job directly without the .output to access a job output but for flows is different. And this difference is I believe worse than the one with I will try to get back a bit on the |
|
@janosh: Thanks! For your example though, we don't know if @davidwaroquiers: Yes, I agree. I view the |
|
In full disclosure, I am going to need to put this PR on hold. Anyone willing to can feel free to expand on it with a follow-up PR if they wish! If we start seeing some movement with the |
|
Tagging @vineetbansal since there is some discussion about the |
|
Closed due to #823. |
Summary
Following in @janosh's footsteps, I propose a
__getitem__magic method forJob, such that if you index aJobobject by a key (as if it were a dictionary) or an index (as if it were a list), the.outputis implicitly called for the user and theOutputReferenceis returned.Motivation
Consider these two toy functions:
If you want to make a
Flowof them, you have to currently do:However, new users to Jobflow often stumble on this in my experience because they expect to be able to do:
They can now do that without any issues 😄