Skip to content

Commit 4b1ad23

Browse files
jeevbwild-endeavordevictrFuture-OutlierFuture Outlier
authored
[Extended Resources] GPU Accelerators (#1843)
* pip through to container Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * move around Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * add asserts Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * delete bad line Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * switch to abc and add support for gpu unpartitioned Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Add Azure-specific headers when uploading to blob storage (#1784) * Add Azure-specific headers when uploading to blob storage Signed-off-by: Victor Delépine <victor.delepine@wayve.ai> * Add comment about HTTP 201 check Signed-off-by: Victor Delépine <victor.delepine@wayve.ai> --------- Signed-off-by: Victor Delépine <victor.delepine@wayve.ai> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Add async delete function in base_agent (#1800) Signed-off-by: Future Outlier <eric901201@gmai.com> Co-authored-by: Future Outlier <eric901201@gmai.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Add support for execution name prefixes (#1803) Signed-off-by: troychiu <y.troychiu@gmail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Remove ref in output (#1794) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Inherit directly from DataClassJsonMixin instead of using @dataclass_json for improved static type checking (#1801) * Inherit directly from DataClassJsonMixin instead of @dataclass_json for improved static type checking As it says in the dataclasses-json README: https://github.com/lidatong/dataclasses-json/blob/89578cb9ebed290e70dba8946bfdb68ff6746755/README.md?plain=1#L111-L129, we can use inheritance for improved static type checking; this one change eliminates something like 467 pyright errors from the flytekit module Signed-off-by: Matthew Hoffman <matthew@protopia.ai> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Async file sensor (#1790) --------- Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Eager workflows to support async workflows (#1579) * Eager workflows to support async workflows Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * move array node maptask to experimental/__init__.py Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * clean up docs Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * clean up Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * more clean up Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * docs cleanup Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * Update test_eager_workflows.py * clean up timeout handling Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * fix lint Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> --------- Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Enable SecretsManager.get to load and return bytes (#1798) * fix secretsmanager Signed-off-by: Yue Shang <s.yue3074@gmail.com> * fix lint issue Signed-off-by: Yue Shang <s.yue3074@gmail.com> * add doc Signed-off-by: Yue Shang <s.yue3074@gmail.com> * fix github check Signed-off-by: Yue Shang <s.yue3074@gmail.com> --------- Signed-off-by: Yue Shang <s.yue3074@gmail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Batch upload flyte directory (#1806) * Batch upload flyte directory Signed-off-by: Kevin Su <pingsutw@apache.org> * Update get method Signed-off-by: Kevin Su <pingsutw@apache.org> * Move batch size to type engine Signed-off-by: Kevin Su <pingsutw@apache.org> * comment Signed-off-by: Kevin Su <pingsutw@apache.org> * update comment Signed-off-by: Kevin Su <pingsutw@apache.org> * Update flytekit/core/type_engine.py Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com> * Add test Signed-off-by: Kevin Su <pingsutw@apache.org> --------- Signed-off-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Better error messaging for overrides (#1807) - using incorrect type of overrides - using incorrect type for resources - using promises in overrides Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Run remote Launchplan from `pyflyte run` (#1785) * Beautified pyflyte run even for every task and workflow - identify a task or a workflow - task or workflow help menus show types and use rich to beautify Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * one more improvement Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated command Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated formatting Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * bug fixed in types Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * lint Signed-off-by: Kevin Su <pingsutw@apache.org> --------- Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Add is none function (#1757) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Dynamic workflow should not throw nested task warning (#1812) Signed-off-by: oliverhu <khu@linkedin.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Add a manual image building GH action (#1816) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * catch abfs protocol in data_persistence.py/get_filesystem and set anon to False (#1813) Signed-off-by: Jan Fiedler <jan.fiedler@kineo.ai> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * None doesnt work Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * unpartitioned selector Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Fix list of annotated structured dataset (#1817) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Support the flytectl config.yaml admin.clientSecretEnvVar option in flytekit (#1819) * Support the flytectl config.yaml admin.clientSecretEnvVar option in flytekit Signed-off-by: Chao-Heng Lee <chaohengstudent@gmail.com> * remove helper of getting env var. Signed-off-by: Chao-Heng Lee <chaohengstudent@gmail.com> * refactor variable name. Signed-off-by: Chao-Heng Lee <chaohengstudent@gmail.com> --------- Signed-off-by: Chao-Heng Lee <chaohengstudent@gmail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Async agent delete function for while loop case (#1802) Signed-off-by: Future Outlier <eric901201@gmai.com> Signed-off-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Future Outlier <eric901201@gmai.com> Co-authored-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * refactor Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * fix docs warnings (#1827) Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Fix extract_task_module (#1829) --------- Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Feat: Add type support for pydantic BaseModels (#1660) Signed-off-by: Adrian Rumpold <a.rumpold@gmail.com> Signed-off-by: Arthur <atte.book@gmail.com> Signed-off-by: wirthual <wirthra@gmail.com> Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * add test for unspecified mig Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * add support for overriding accelerator Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * cleanup Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * move from core to extras Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * fixes Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * fixes Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * fixes Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * cleanup Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Make FlyteRemote slightly more copy/pastable (#1830) Signed-off-by: Katrina Rogan <katroganGH@gmail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Pyflyte meta inputs (#1823) * Re-orgining pyflyte run Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Pyflyte beautified and simplified Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fixed unit test Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Added Launch options Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * lint fix Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * test fix Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fixing docs failure Signed-off-by: Ketan Umare <ketan.umare@gmail.com> --------- Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Use mashumaro to serialize/deserialize dataclass (#1735) Signed-off-by: HH <hhcs9527@gmail.com> Signed-off-by: hhcs9527 <hhcs9527@gmail.com> Signed-off-by: Matthew Hoffman <matthew@protopia.ai> Co-authored-by: Matthew Hoffman <matthew@protopia.ai> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Databricks Agent (#1797) Signed-off-by: Future Outlier <eric901201@gmai.com> Signed-off-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Future Outlier <eric901201@gmai.com> Co-authored-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Prometheus metrics (#1815) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Pyflyte register optionally activates schedule (#1832) * Pyflyte register auto activates schedule Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * comment addressed Signed-off-by: Ketan Umare <ketan.umare@gmail.com> --------- Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Remove versions 3.9 and 3.10 (#1831) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Snowflake agent (#1799) Signed-off-by: hhcs9527 <hhcs9527@gmail.com> Signed-off-by: HH <hhcs9527@gmail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Update agent metric name (#1835) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * MemVerge MMCloud Agent (#1821) Signed-off-by: Edwin Yu <edwinyyyu@gmail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Add download badges in readme (#1836) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Eager local entrypoint and support for offloaded types (#1833) * implement eager workflow local entrypoint, support offloaded types Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * wip local entrypoint Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * add tests Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * add local entrypoint tests Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * update eager unit tests, delete test script Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * clean up tests Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * update ci Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * update ci Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * update ci Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * update ci Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * remove push step Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> --------- Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * update requirements and add snowflake agent to api reference (#1838) * update requirements and add snowflake agent to api reference Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * update requirements Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * remove versions Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * remove tensorflow-macos Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * lint Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * downgrade sphinxcontrib-youtube package Signed-off-by: Samhita Alla <aallasamhita@gmail.com> --------- Signed-off-by: Samhita Alla <aallasamhita@gmail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Fix: Make sure decks created in elastic task workers are transferred to parent process (#1837) * Transfer decks created in the worker process to the parent process Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> * Add test for decks in elastic tasks Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> * Update plugins/flytekit-kf-pytorch/flytekitplugins/kfpytorch/task.py Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> * Update plugins/flytekit-kf-pytorch/flytekitplugins/kfpytorch/task.py Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> --------- Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * add accept grpc (#1841) * add accept grpc Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * unpin setup.py grpc Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Revert "add accept grpc" This reverts commit 2294592. Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * default headers interceptor Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * setup.py Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * fixes Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * fmt Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * move prometheus-client import Signed-off-by: Jeev B <jeevb@users.noreply.github.com> --------- Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> Co-authored-by: Jeev B <jeevb@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Feat: Enable `flytekit` to authenticate with proxy in front of FlyteAdmin (#1787) * Introduce authenticator engine and make proxy auth work Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Use proxy authed session for client credentials flow Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Don't use authenticator engine but do proxy authentication via existing external command authenticator Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Add docstring to AuthenticationHTTPAdapter Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Address todo in docstring Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Create blank session if none provided Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Create blank session if none provided in get_token Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Refresh proxy creds in session when not existing without triggering 401 Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Add test for get_session Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Move auth helper test into existing module Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Move auth helper test into existing module Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Add test for upgrade_channel_to_proxy_authenticated Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Auth helper tests without use of responses package Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Feat: Add plugin for generating GCP IAP ID tokens via external command (#1795) * Add external command plugin to generate id tokens for identity aware proxy Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Retrieve desktop app client secret from gcp secret manager Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Remove comments Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Introduce a command group that allows adding a command to generate service account id tokens later Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Document how to use plugin and deploy Flyte with IAP Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Minor corrections README.md Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> --------- Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> Co-authored-by: Fabio Grätz <fabiogratz@googlemail.com> Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Use proxy auth'ed session for device code auth flow Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Fix token client tests Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Make poll token endpoint test more specific Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Make test_client_creds_authenticator test work and more specific Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Make test_client_creds_authenticator_with_custom_scopes test work and more specific Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Implement subcommand to generate id tokens for service accounts Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> * Test id token generation from service accounts Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> * Fix plugin requirements Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> * Document usage of generate-service-account-id-token subcommand Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> * Document alternative ways to obtain service account id tokens Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> --------- Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> Co-authored-by: Fabio Grätz <fabiogratz@googlemail.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * bump flyteidl Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * make requirements Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * fix failing tests Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * move gpu accelerator to flyteidl.core.Resources Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Use ResourceExtensions for extended resources Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * cleanup Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Switch to using ExtendedResources in TaskTemplate Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * cleanups Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * update flyteidl Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Replace _core_task imports with tasks_pb2 Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * less verbose definitions Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Attempt at less confusing syntax Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Streamline UX Signed-off-by: Jeev B <jeevb@users.noreply.github.com> * Run make fmt Signed-off-by: Jeev B <jeevb@users.noreply.github.com> --------- Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Jeev B <jeevb@users.noreply.github.com> Signed-off-by: Victor Delépine <victor.delepine@wayve.ai> Signed-off-by: Future Outlier <eric901201@gmai.com> Signed-off-by: troychiu <y.troychiu@gmail.com> Signed-off-by: Matthew Hoffman <matthew@protopia.ai> Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Signed-off-by: Yue Shang <s.yue3074@gmail.com> Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: oliverhu <khu@linkedin.com> Signed-off-by: Jan Fiedler <jan.fiedler@kineo.ai> Signed-off-by: Chao-Heng Lee <chaohengstudent@gmail.com> Signed-off-by: Adrian Rumpold <a.rumpold@gmail.com> Signed-off-by: Arthur <atte.book@gmail.com> Signed-off-by: wirthual <wirthra@gmail.com> Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> Signed-off-by: Katrina Rogan <katroganGH@gmail.com> Signed-off-by: HH <hhcs9527@gmail.com> Signed-off-by: hhcs9527 <hhcs9527@gmail.com> Signed-off-by: Edwin Yu <edwinyyyu@gmail.com> Signed-off-by: Samhita Alla <aallasamhita@gmail.com> Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Co-authored-by: Victor Delépine <vctr.delepine@gmail.com> Co-authored-by: Future-Outlier <eric901201@gmail.com> Co-authored-by: Future Outlier <eric901201@gmai.com> Co-authored-by: Yi Chiu <114708546+troychiu@users.noreply.github.com> Co-authored-by: Matthew Hoffman <matthew@protopia.ai> Co-authored-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Niels Bantilan <niels.bantilan@gmail.com> Co-authored-by: Yue Shang <138256885+ysysys3074@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com> Co-authored-by: Ketan Umare <16888709+kumare3@users.noreply.github.com> Co-authored-by: Keqiu Hu <khu@linkedin.com> Co-authored-by: Jan Fiedler <89976021+fiedlerNr9@users.noreply.github.com> Co-authored-by: Chao-Heng Lee <chaohengstudent@gmail.com> Co-authored-by: Samhita Alla <aallasamhita@gmail.com> Co-authored-by: Arthur Böök <49250723+ArthurBook@users.noreply.github.com> Co-authored-by: Katrina Rogan <katroganGH@gmail.com> Co-authored-by: Po Han(Hank) Huang <hhcs9527@gmail.com> Co-authored-by: Edwin Yu <92917168+edwinyyyu@users.noreply.github.com> Co-authored-by: Fabio M. Graetz, Ph.D <fabiograetz@googlemail.com> Co-authored-by: Fabio Grätz <fabiogratz@googlemail.com>
1 parent d9ad0e1 commit 4b1ad23

File tree

14 files changed

+315
-10
lines changed

14 files changed

+315
-10
lines changed

flytekit/core/base_task.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@
2424
from dataclasses import dataclass
2525
from typing import Any, Coroutine, Dict, Generic, List, Optional, OrderedDict, Tuple, Type, TypeVar, Union, cast
2626

27+
from flyteidl.core import tasks_pb2
28+
2729
from flytekit.configuration import SerializationSettings
2830
from flytekit.core.context_manager import (
2931
ExecutionParameters,
@@ -344,6 +346,12 @@ def get_config(self, settings: SerializationSettings) -> Optional[Dict[str, str]
344346
"""
345347
return None
346348

349+
def get_extended_resources(self, settings: SerializationSettings) -> Optional[tasks_pb2.ExtendedResources]:
350+
"""
351+
Returns the extended resources to allocate to the task on hosted Flyte.
352+
"""
353+
return None
354+
347355
def local_execution_mode(self) -> ExecutionState.Mode:
348356
""" """
349357
return ExecutionState.Mode.LOCAL_TASK_EXECUTION

flytekit/core/node.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
import typing
55
from typing import Any, List
66

7+
from flyteidl.core import tasks_pb2
8+
79
from flytekit.core.resources import Resources, convert_resources_to_resource_model
810
from flytekit.core.utils import _dnsify
911
from flytekit.loggers import logger
@@ -62,6 +64,7 @@ def __init__(
6264
self._aliases: _workflow_model.Alias = None
6365
self._outputs = None
6466
self._resources: typing.Optional[_resources_model] = None
67+
self._extended_resources: typing.Optional[tasks_pb2.ExtendedResources] = None
6568

6669
def runs_before(self, other: Node):
6770
"""
@@ -172,6 +175,11 @@ def with_overrides(self, *args, **kwargs):
172175
assert_not_promise(v, "container_image")
173176
self.flyte_entity._container_image = v
174177

178+
if "accelerator" in kwargs:
179+
v = kwargs["accelerator"]
180+
assert_not_promise(v, "accelerator")
181+
self._extended_resources = tasks_pb2.ExtendedResources(gpu_accelerator=v.to_flyte_idl())
182+
175183
return self
176184

177185

flytekit/core/python_auto_container.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@
55
from abc import ABC
66
from typing import Callable, Dict, List, Optional, TypeVar, Union
77

8+
from flyteidl.core import tasks_pb2
9+
810
from flytekit.configuration import ImageConfig, SerializationSettings
911
from flytekit.core.base_task import PythonTask, TaskMetadata, TaskResolverMixin
1012
from flytekit.core.context_manager import FlyteContextManager
@@ -13,6 +15,7 @@
1315
from flytekit.core.tracked_abc import FlyteTrackedABC
1416
from flytekit.core.tracker import TrackedInstance, extract_task_module
1517
from flytekit.core.utils import _get_container_definition, _serialize_pod_spec, timeit
18+
from flytekit.extras.accelerators import BaseAccelerator
1619
from flytekit.image_spec.image_spec import ImageBuildEngine, ImageSpec
1720
from flytekit.loggers import logger
1821
from flytekit.models import task as _task_model
@@ -44,6 +47,7 @@ def __init__(
4447
secret_requests: Optional[List[Secret]] = None,
4548
pod_template: Optional[PodTemplate] = None,
4649
pod_template_name: Optional[str] = None,
50+
accelerator: Optional[BaseAccelerator] = None,
4751
**kwargs,
4852
):
4953
"""
@@ -70,6 +74,7 @@ def __init__(
7074
- `AWS Parameter store <https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html>`__
7175
:param pod_template: Custom PodTemplate for this task.
7276
:param pod_template_name: The name of the existing PodTemplate resource which will be used in this task.
77+
:param accelerator: The accelerator to use for this task.
7378
"""
7479
sec_ctx = None
7580
if secret_requests:
@@ -110,6 +115,7 @@ def __init__(
110115
self._get_command_fn = self.get_default_command
111116

112117
self.pod_template = pod_template
118+
self.accelerator = accelerator
113119

114120
@property
115121
def task_resolver(self) -> TaskResolverMixin:
@@ -219,6 +225,15 @@ def get_config(self, settings: SerializationSettings) -> Optional[Dict[str, str]
219225
return {}
220226
return {_PRIMARY_CONTAINER_NAME_FIELD: self.pod_template.primary_container_name}
221227

228+
def get_extended_resources(self, settings: SerializationSettings) -> Optional[tasks_pb2.ExtendedResources]:
229+
"""
230+
Returns the extended resources to allocate to the task on hosted Flyte.
231+
"""
232+
if self.accelerator is None:
233+
return None
234+
235+
return tasks_pb2.ExtendedResources(gpu_accelerator=self.accelerator.to_flyte_idl())
236+
222237

223238
class DefaultTaskResolver(TrackedInstance, TaskResolverMixin):
224239
"""

flytekit/core/task.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from flytekit.core.python_function_task import PythonFunctionTask
99
from flytekit.core.reference_entity import ReferenceEntity, TaskReference
1010
from flytekit.core.resources import Resources
11+
from flytekit.extras.accelerators import BaseAccelerator
1112
from flytekit.image_spec.image_spec import ImageSpec
1213
from flytekit.models.documentation import Documentation
1314
from flytekit.models.security import Secret
@@ -102,6 +103,7 @@ def task(
102103
enable_deck: Optional[bool] = ...,
103104
pod_template: Optional["PodTemplate"] = ...,
104105
pod_template_name: Optional[str] = ...,
106+
accelerator: Optional[BaseAccelerator] = ...,
105107
) -> Callable[[Callable[..., FuncOut]], PythonFunctionTask[T]]:
106108
...
107109

@@ -129,6 +131,7 @@ def task(
129131
enable_deck: Optional[bool] = ...,
130132
pod_template: Optional["PodTemplate"] = ...,
131133
pod_template_name: Optional[str] = ...,
134+
accelerator: Optional[BaseAccelerator] = ...,
132135
) -> Union[PythonFunctionTask[T], Callable[..., FuncOut]]:
133136
...
134137

@@ -155,6 +158,7 @@ def task(
155158
enable_deck: Optional[bool] = None,
156159
pod_template: Optional["PodTemplate"] = None,
157160
pod_template_name: Optional[str] = None,
161+
accelerator: Optional[BaseAccelerator] = None,
158162
) -> Union[Callable[[Callable[..., FuncOut]], PythonFunctionTask[T]], PythonFunctionTask[T], Callable[..., FuncOut]]:
159163
"""
160164
This is the core decorator to use for any task type in flytekit.
@@ -248,6 +252,7 @@ def foo2():
248252
:param docs: Documentation about this task
249253
:param pod_template: Custom PodTemplate for this task.
250254
:param pod_template_name: The name of the existing PodTemplate resource which will be used in this task.
255+
:param accelerator: The accelerator to use for this task.
251256
"""
252257

253258
def wrapper(fn: Callable[..., Any]) -> PythonFunctionTask[T]:
@@ -277,6 +282,7 @@ def wrapper(fn: Callable[..., Any]) -> PythonFunctionTask[T]:
277282
docs=docs,
278283
pod_template=pod_template,
279284
pod_template_name=pod_template_name,
285+
accelerator=accelerator,
280286
)
281287
update_wrapper(task_instance, fn)
282288
return task_instance

flytekit/extras/accelerators.py

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
import abc
2+
import copy
3+
from typing import ClassVar, Generic, Optional, Type, TypeVar
4+
5+
from flyteidl.core import tasks_pb2
6+
7+
T = TypeVar("T")
8+
MIG = TypeVar("MIG", bound="MultiInstanceGPUAccelerator")
9+
10+
11+
class BaseAccelerator(abc.ABC, Generic[T]):
12+
@abc.abstractmethod
13+
def to_flyte_idl(self) -> T:
14+
...
15+
16+
17+
class GPUAccelerator(BaseAccelerator):
18+
def __init__(self, device: str) -> None:
19+
self._device = device
20+
21+
def to_flyte_idl(self) -> tasks_pb2.GPUAccelerator:
22+
return tasks_pb2.GPUAccelerator(device=self._device)
23+
24+
25+
A10G = GPUAccelerator("nvidia-a10g")
26+
L4 = GPUAccelerator("nvidia-l4-vws")
27+
K80 = GPUAccelerator("nvidia-tesla-k80")
28+
M60 = GPUAccelerator("nvidia-tesla-m60")
29+
P4 = GPUAccelerator("nvidia-tesla-p4")
30+
P100 = GPUAccelerator("nvidia-tesla-p100")
31+
T4 = GPUAccelerator("nvidia-tesla-t4")
32+
V100 = GPUAccelerator("nvidia-tesla-v100")
33+
34+
35+
class MultiInstanceGPUAccelerator(BaseAccelerator):
36+
device: ClassVar[str]
37+
_partition_size: Optional[str]
38+
39+
@property
40+
def unpartitioned(self: MIG) -> MIG:
41+
instance = copy.deepcopy(self)
42+
instance._partition_size = None
43+
return instance
44+
45+
@classmethod
46+
def partitioned(cls: Type[MIG], partition_size: str) -> MIG:
47+
instance = cls()
48+
instance._partition_size = partition_size
49+
return instance
50+
51+
def to_flyte_idl(self) -> tasks_pb2.GPUAccelerator:
52+
msg = tasks_pb2.GPUAccelerator(device=self.device)
53+
if not hasattr(self, "_partition_size"):
54+
return msg
55+
56+
if self._partition_size is None:
57+
msg.unpartitioned = True
58+
else:
59+
msg.partition_size = self._partition_size
60+
return msg
61+
62+
63+
class _A100_Base(MultiInstanceGPUAccelerator):
64+
device = "nvidia-tesla-a100"
65+
66+
67+
class _A100(_A100_Base):
68+
partition_1g_5gb = _A100_Base.partitioned("1g.5gb")
69+
partition_2g_10gb = _A100_Base.partitioned("2g.10gb")
70+
partition_3g_20gb = _A100_Base.partitioned("3g.20gb")
71+
partition_4g_20gb = _A100_Base.partitioned("4g.20gb")
72+
partition_7g_40gb = _A100_Base.partitioned("7g.40gb")
73+
74+
75+
A100 = _A100()
76+
77+
78+
class _A100_80GB_Base(MultiInstanceGPUAccelerator):
79+
device = "nvidia-a100-80gb"
80+
81+
82+
class _A100_80GB(_A100_80GB_Base):
83+
partition_1g_10gb = _A100_80GB_Base.partitioned("1g.10gb")
84+
partition_2g_20gb = _A100_80GB_Base.partitioned("2g.20gb")
85+
partition_3g_40gb = _A100_80GB_Base.partitioned("3g.40gb")
86+
partition_4g_40gb = _A100_80GB_Base.partitioned("4g.40gb")
87+
partition_7g_80gb = _A100_80GB_Base.partitioned("7g.80gb")
88+
89+
90+
A100_80GB = _A100_80GB()

flytekit/models/core/workflow.py

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import datetime
22
import typing
33

4+
from flyteidl.core import tasks_pb2
45
from flyteidl.core import workflow_pb2 as _core_workflow
56

67
from flytekit.models import common as _common
@@ -562,24 +563,33 @@ def from_flyte_idl(cls, pb2_object):
562563

563564

564565
class TaskNodeOverrides(_common.FlyteIdlEntity):
565-
def __init__(self, resources: typing.Optional[Resources] = None):
566+
def __init__(
567+
self, resources: typing.Optional[Resources], extended_resources: typing.Optional[tasks_pb2.ExtendedResources]
568+
):
566569
self._resources = resources
570+
self._extended_resources = extended_resources
567571

568572
@property
569573
def resources(self) -> Resources:
570574
return self._resources
571575

576+
@property
577+
def extended_resources(self) -> tasks_pb2.ExtendedResources:
578+
return self._extended_resources
579+
572580
def to_flyte_idl(self):
573581
return _core_workflow.TaskNodeOverrides(
574582
resources=self.resources.to_flyte_idl() if self.resources is not None else None,
583+
extended_resources=self.extended_resources,
575584
)
576585

577586
@classmethod
578587
def from_flyte_idl(cls, pb2_object):
579588
resources = Resources.from_flyte_idl(pb2_object.resources)
589+
extended_resources = pb2_object.extended_resources if pb2_object.HasField("extended_resources") else None
580590
if bool(resources.requests) or bool(resources.limits):
581-
return cls(resources=resources)
582-
return cls(resources=None)
591+
return cls(resources=resources, extended_resources=extended_resources)
592+
return cls(resources=None, extended_resources=extended_resources)
583593

584594

585595
class TaskNode(_common.FlyteIdlEntity):

flytekit/models/task.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,7 @@ def __init__(
336336
config=None,
337337
k8s_pod=None,
338338
sql=None,
339+
extended_resources=None,
339340
):
340341
"""
341342
A task template represents the full set of information necessary to perform a unit of work in the Flyte system.
@@ -359,6 +360,7 @@ def __init__(
359360
in tandem with the custom.
360361
:param K8sPod k8s_pod: Alternative to the container used to execute this task.
361362
:param Sql sql: This is used to execute query in FlytePropeller instead of running container or k8s_pod.
363+
:param flyteidl.core.tasks_pb2.ExtendedResources extended_resources: The extended resources to allocate to the task.
362364
"""
363365
if (
364366
(container is not None and k8s_pod is not None)
@@ -377,6 +379,7 @@ def __init__(
377379
self._security_context = security_context
378380
self._k8s_pod = k8s_pod
379381
self._sql = sql
382+
self._extended_resources = extended_resources
380383

381384
@property
382385
def id(self):
@@ -451,6 +454,14 @@ def k8s_pod(self):
451454
def sql(self):
452455
return self._sql
453456

457+
@property
458+
def extended_resources(self):
459+
"""
460+
If not None, the extended resources to allocate to the task.
461+
:rtype: flyteidl.core.tasks_pb2.ExtendedResources
462+
"""
463+
return self._extended_resources
464+
454465
def to_flyte_idl(self):
455466
"""
456467
:rtype: flyteidl.core.tasks_pb2.TaskTemplate
@@ -464,6 +475,7 @@ def to_flyte_idl(self):
464475
container=self.container.to_flyte_idl() if self.container else None,
465476
task_type_version=self.task_type_version,
466477
security_context=self.security_context.to_flyte_idl() if self.security_context else None,
478+
extended_resources=self.extended_resources,
467479
config={k: v for k, v in self.config.items()} if self.config is not None else None,
468480
k8s_pod=self.k8s_pod.to_flyte_idl() if self.k8s_pod else None,
469481
sql=self.sql.to_flyte_idl() if self.sql else None,
@@ -487,6 +499,7 @@ def from_flyte_idl(cls, pb2_object):
487499
security_context=_sec.SecurityContext.from_flyte_idl(pb2_object.security_context)
488500
if pb2_object.security_context and pb2_object.security_context.ByteSize() > 0
489501
else None,
502+
extended_resources=pb2_object.extended_resources if pb2_object.HasField("extended_resources") else None,
490503
config={k: v for k, v in pb2_object.config.items()} if pb2_object.config is not None else None,
491504
k8s_pod=K8sPod.from_flyte_idl(pb2_object.k8s_pod) if pb2_object.HasField("k8s_pod") else None,
492505
sql=Sql.from_flyte_idl(pb2_object.sql) if pb2_object.HasField("sql") else None,

flytekit/tools/translator.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,7 @@ def get_serializable_task(
214214
config=entity.get_config(settings),
215215
k8s_pod=pod,
216216
sql=entity.get_sql(settings),
217+
extended_resources=entity.get_extended_resources(settings),
217218
)
218219
if settings.should_fast_serialize() and isinstance(entity, PythonAutoContainerTask):
219220
entity.reset_command_fn()
@@ -440,7 +441,8 @@ def get_serializable_node(
440441
upstream_node_ids=[n.id for n in upstream_nodes],
441442
output_aliases=[],
442443
task_node=workflow_model.TaskNode(
443-
reference_id=task_spec.template.id, overrides=TaskNodeOverrides(resources=entity._resources)
444+
reference_id=task_spec.template.id,
445+
overrides=TaskNodeOverrides(resources=entity._resources, extended_resources=entity._extended_resources),
444446
),
445447
)
446448
if entity._aliases:
@@ -516,7 +518,8 @@ def get_serializable_node(
516518
upstream_node_ids=[n.id for n in upstream_nodes],
517519
output_aliases=[],
518520
task_node=workflow_model.TaskNode(
519-
reference_id=entity.flyte_entity.id, overrides=TaskNodeOverrides(resources=entity._resources)
521+
reference_id=entity.flyte_entity.id,
522+
overrides=TaskNodeOverrides(resources=entity._resources, extended_resources=entity._extended_resources),
520523
),
521524
)
522525
elif isinstance(entity.flyte_entity, FlyteWorkflow):
@@ -565,7 +568,7 @@ def get_serializable_array_node(
565568
task_spec = get_serializable(entity_mapping, settings, entity, options)
566569
task_node = workflow_model.TaskNode(
567570
reference_id=task_spec.template.id,
568-
overrides=TaskNodeOverrides(resources=node._resources),
571+
overrides=TaskNodeOverrides(resources=node._resources, extended_resources=node._extended_resources),
569572
)
570573
node = workflow_model.Node(
571574
id=entity.name,

0 commit comments

Comments
 (0)