Skip to content

Handle use cases where producer needs to run on multiple VM's or in batch. #686

@shyam4u

Description

@shyam4u

This is a brilliant framework and am trying to explore if this could be used for my use case and need some help here.My use case is as below.

Say I am trying to build a list of Customers that needs to be shipped from staging to Prod environment and the Customers need to be grouped by the region they represent.My producer which retrieves data from customer source of truth needs to prepare region/country specific data files that needs to be shipped.Say I have 2 regions - US and CA and I need to produce 2 data files.While producing the data my producer process in parallel and are required to generate the region specific files on 2 JVM's since the files would be large enough to be produced on a single JVM .Also I would like to produce 1 snapshot as the window when all the region files are generated are common for all files.Say I generated a version V1 and my requirement is to have V1 for both US and CA and on the consumption side ,I would like to publish that I have a version V1 that needs to be loaded or in other words the consumer need to load both CA and US data .

I think we can use Incremental withNumStatesBetweenSnapshots to make it publish snapshot only at beginning and at last so that it run like "in batches". But I am confused how can I make sure I am just publishing 1 snapshot for both files being generated on producer side .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions