[RFC] Deep Learning Workflows #1582

zachgk · 2022-04-19T00:04:06Z

zachgk
Apr 19, 2022
Maintainer

The goal of this project is to build a tool for workflows of deep learning inference serving involving multiple models. While some customer use cases require only serving a single model, there are also many use cases that involve multiple steps and multiple models. Rather than requiring custom code to support these use cases, a simple configuration system can handle many of them.

Examples

First, let's establish a few motivating examples for the workflows. The first example is a simple workflow involving pre-processing and post-processing:

-> PreProcess -> Model -> PostProcess ->

The next example is models used in sequence. Specifically, we can take the example of pose estimation which tries to find the joints in a human image which can then be used by applications like movie CGI. It first uses an object detection model to find the human and then the pose estimation to find the joints on it:

-> HumanDetection -> PoseEstimation ->

The final motivating example is using a model ensemble. In an ensemble, instead of training one model to solve the problem, it trains multiple and combines their results. This works similarly to having multiple people coming together to make a decision and often gives an easy accuracy boost:

-> PreProcess -> {Model1, Model2, Model3} -> Aggregate -> PostProcess ->

In summary, this tools focuses on specifying workflows involving multiple steps and multiple models. Steps can include both models and raw code. They also are running in both sequence and parallel. These are some of the main design requirements.

Design Goals

Now that we have some examples, let's work on identifying the problem. Many problems that we try to solve are really fragments of a larger problem, and identifying the larger problem can help understand some of the difficulties that we will face.

The problem of this workflow is essentially the same problem as building a programming language. The workflow configuration acts like code and the tool that runs the workflow behaves like a language interpreter/compiler. Given this relationship, many difficulties faced by programming languages will end up showing up in some form for the workflow as well.

As a language, let's start by going through the basic language features. As one of the goals of a workflow is to enable parallelization, it should specifically focus on describing states with their dependencies like in functional programming. This makes it easier for a parallel interpreter to run.

A possible goal is to support conditional statements. Matching more functional approaches, it would likely have else branches that are required rather than optional so that the conditional is guaranteed to have a value. Conditionals can open up additional use cases by expanding what workflows are representable. However, it also requires designing some kinds of boolean support. Most likely, this will be a future feature depending on customer demand.

Next, a specific non-goal is the use of loops or recursion. Including loops and recursion turns it from a straightforward workflow that can be represented with a DAG into a Turing Complete language. It would require much more features to implement and to support development such as IDEs and debuggers. So, this non-goal should greatly simplify the tool.

There are of course costs to this simplification. Many models will not be possible using this tool. One notable example is that it should hurt those featuring a sequence of data where a model needs to be run for each element in the sequence. However, such sequences are still supported within the bounds of a model so it should still work even for most sequence use cases.

Another issue is that some of the gains in terms of ease of implementation come out of ease of use for customers. For an example, let's go back to the ensemble case. If there were an ensemble with say 50 models in it, it would require quite a lot of typing from users as you couldn't loop through each model.

Following from this complication is that when the levels of verbosity get bad enough, users will then manage not the configuration directly but a script used to create the configuration. A similar methodology can be sometimes seen in AWS CloudFormation with the AWS CDK. However, this remains manageable and we can make minor efforts such as supporting JSON input to help ease the use case.

Another way to improve the situation is to support pattern globs. Consider Makefiles which are another example of a non-recursive workflow language. Makefiles feature pattern substitutions to represent many files with only a single description. Adding something similar could help resolve the difficulties with large ensembles described above and other similar situations. This would probably be a future feature.

Outside of the language itself, the goal is to create a reusable standard for deep learning serving systems. Right now, several other systems also have a similar need including Triton Serving, PyTorch Serving, and DJL Serving. While not all details can be shared, it should attempt to be usable for all of these serving systems and future ones as well.

Finally, the addition of cross-language tooling is also a non-goal. The workflow system exists within a hierarchy of languages. It defines a workflow that is built out of models. Each model is build out of NDArray operators defined within a deep learning engine. And, each deep learning engine is built out of C++ code compiled into CPU/GPU instructions.

The major difficulty of having multiple languages is that tooling very rarely works across language boundaries. This includes type systems, static analysis, IDE functions such as "Go To Definition", debuggers, compilers, and performance optimizations. Little to no effort will be planned to attempt to build these cross language tools.

Workflow Design

Medium

One of the first design options is the medium. The three main possibilities are configuration (YAML), DSL, and language embedded.

The planned strategy is to use a configuration in YAML. The equivalent format of JSON can also be supported as well with very little work. The benefits of the YAML is that it is the simplest to implement, it well known which can improve learning speed for new users, and already contains significant IDE support. The downside is that it can be somewhat restrictive on syntax leading to some compromises.

Another option is to write a DSL with a custom syntax and grammar. Unlike the YAML, it would be able to have somewhat cleaner syntax. However, the syntax is sufficient for the expected complexity of programs so this isn't too strong of a benefit. In comparison, it will require much more effort to implement due to including a grammar and more effort to learn the syntax nuances. Overall, it doesn't seem worth it.

A language embedded is when one programming language is used to write another. One example is Apache Flink which uses Java to define the Flink stream. Similarly, Python is used by many deep learning engines to define the models.

This leads to some benefits when defining complex systems. Scripts won't be necessary to define the workflow as the embedding language (say Python) is itself capable of working like that script. However, it then means that all uses of the workflow system only work within the bounds of the embedding language and removes much of the ease of use expected.

Overall, this system is intended only for a medium complexity of workflows that are more than one model but not too many models. An embedding language would be best suited for larger complexity systems instead. And for those circumstances, it might be worth using custom serving code rather than relying on the model server.

Global Configuration

As the system is built in YAML, the overall structure is a configuration object with options similar to:

name: "MyWorkflow"
version: "1.2.0"

# Default model properties based on https://github.com/pytorch/serve/blob/master/docs/workflows.md#workflow-model-properties
minWorkers: 1
maxWorkers: 4
batchSize: 3
maxBatchDelay: 5000
retryAttempts: 3
timeout: 5000

# Defined below
models: ...
functions: ...
workflow: ...

There are a few categories of configuration that are supported. The first is a basic string name and version to differentiate workflows.

After that are some performance properties. These will behave as defaults and can be overridden in individual models.

Models

The models section is used to declare the models that will be used within the workflow. It works such as an external definition as the models are built and trained separately.

Each model will be identified by a local model name. So, the overall section would look like:

models:
  modelA: ...
  modelB: ...
  modelC: ...
  ...

Each model individually would be an object describing how to load the model. The format of this can vary depending the serving implementation.

The simplest case is where the file can be loaded just from a URL. One example is the MMS .mar file or through the DJL model zoo:

models:
  modelA: "https://example.com/path/to/model.mar"
  modelB: "djl://ai.djl.mxnet/ssd/0.0.1/ssd_512_resnet50_v1_voc"

A more advanced case can use an object representing the DJL criteria:

models:
  resnet:
    application: "cv/image_classification"
    engine: "MXNet"
    groupId: "ai.djl.mxnet"
    artifactId: "resnet"
    name: "Resnet"
    translator: "com.package.TranslatorClass"
    ...

External Code

Along with the models, it is useful to be able to import functions written in other programming languages. This can be used for custom preProcessing and postProcessing code along with other glue code necessary to combine the models together.

Here, the format would also likely differ between serving implementations. For DJL-Serving, as it is written in Java it would likely require that the called functions also be in Java and added to the classpath. Then, they could be imported as follows:

functions:
  aggregate: "com.package.Aggregator"

The Aggregator would be a class that is required to have a public no-argument constructor and implement the ServingFunction interface. It might look something like:

public final class Aggregator implements ServingFunction {

  @Override
  public Object run(WorkflowExecutor ex, List<WorkflowArguments> args) {
    ...
  }
}

Here, the run function's arguments include the the WorkflowExecutor which can be used to execute higher models in the case of higher order functions. It also includes the WorkflowArguments which are the inputs to the function.

Along with the ability to write custom functions, it would be useful to also include some number of pre-defined functions to represent common actions. One example would be the map function which applies a function/model across a list. A full list of these pre-defined functions will be produced later.

Workflow

Once all of the external definitions are defined, the actual workflow definition can be created. The workflow consists of a number of value definitions that can be thought of like final/immutable local variables in a function.

There are two important special values: in and out. The in is not defined, but must be used to refer to the input passed into the workflow. The out represents the output of the workflow and must always be defined.

Each definition consists of the result of a function application. Function application is written using an array where the first element of the array is the function/model name and the remaining elements are the arguments. This is similar to LISP. While LISP style functions are not the most common, it is chosen due to the constraints of fitting the definition into JSON/YAML.

Here is an example of the simple example given at the top of including a model, preprocessing, and postprocessing:

workflow:
  # First applies preProcessing to the input
  preProcessed: ["preProcess", "in"]
  
  # Then applies "model" to the result of preProcessing stored in the value "preProcessed"
  inferenced: ["model", "preProcessed"]
  
  # The output saved in the keyword "out" is done by applying postProcessing to the inferenced result
  out: ["postProcess", "inferenced"]

It is also possible to nest function calls by replacing arguments with a list. That means that this operation can be defined on a single line:

workflow:
  out: ["postProcess", ["model", ["preProcess", "in"]]]

To represent parallel operations, the data can be split. Here, each of the three models uses the same result of preProcessing so it is only computed once. Both “preProcess” and “postProcess” are just standard custom functions. Then, the results from the models are aggregated using the custom "aggregate" function and passed to postProcessing. It would also be possible to combine "postProcessing" and "aggregate" or to use a predefined function in place of "aggregate":

workflow:
  preProcessed: ["preProcess", "in"]
  m1: ["model1", "preProcessed"]
  m2: ["model2", "preProcessed"]
  m3: ["model3", "preProcessed"]
  out: ["postProcess", ["aggregate", "m1", "m2", "m3"]]

As a final example, here is one that features a more complicated interaction. The human detection model will find all of the humans in an image. Then, the "splitHumans" function will turn all of them into separate images that can be treated as a list. The "map" will apply the "poseEstimation" model to each of the detected humans in the list.

workflow:
  humans: ["splitHumans", ["humanDetection", "in"]]
  out: ["map", "poseEstimation", "humans"]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Deep Learning Workflows #1582

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[RFC] Deep Learning Workflows #1582

Uh oh!

zachgk Apr 19, 2022 Maintainer

Examples

Design Goals

Workflow Design

Medium

Global Configuration

Models

External Code

Workflow

Replies: 0 comments

zachgk
Apr 19, 2022
Maintainer