Skip to content

[RMP] Support Offline Batch processing of Recs Generation Pipelines #419

@jperez999

Description

@jperez999

Problem:

As a user, I would like to run my merlin systems inference pipeline in an offline setting. This will allow me to produce a set of recommendations for all users to be served from a data store, email campaign, etc. I will also be able to conduct rigorous testing and better compare behaviors against other systems, at both operator and system level.

Goal:

To do this I need to be able to run my merlin systems inference graph without using triton or the configs generated for it. It will require a new operator executor class that runs the ops in python instead of tritonserver. The execution should behave exactly as it does in the tritonserver setting, meaning each operator should be provided same inputs, and return same outputs.

  • Run an Inference operator graph without tritonserver.
  • Does not require any new user-facing API changes.
  • Execute the same graph, that would be deployed to tritonserver.
  • Execute in Python process

Constraints:

  • Use the same merlin systems graph/ops that were created for inference pipeline, that would run on tritonserver
  • Swap out the operator executor to python version (non-triton).
  • Allow for all types of graphs, supporting multiple chains and parallel running of ALL available operators.

TODO:

Core

Systems

Issues

Example

### Tasks
- [ ] Create Offline runtime, that will swap operators according to usage i.e. (swap feast operator for dataset merge operator.
- [ ] Ensure every operator returns batch based results. I.e. faiss should return batch representation of inputs. I.e. 2 users in should produce  (2, 100) not (200,) shape.
- [ ] Create an offline example from the current multistage example in merlin
- [ ] Ensure ensemble export does not prevent using Non-triton runtimes later.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions