Infinite Provider Push #4863
Replies: 2 comments 9 replies
-
I am struggling to understand what the issue is, but I doubt it will involve the significant changes you suggest. Can you please summarize in 3-4 sentences with reference to specific code and without going into possible solutions what the issue is? |
Beta Was this translation helpful? Give feedback.
-
It looks like you are trying to transform the Connector into a Message Queue. My suggestion would be to use a message queue implementation as data plane, there are many of them out there, with different features provided. You can see an example on how this can be achieved in the sample repository |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey everyone,
I'd like to discuss how we may extend the Connector's capabilities to allow transferring infinite data using the Provider-PUSH flow. This discussion defines this concept, and will be the basis for the eventual implementation.
Problem
According to the DSP specification: "Data may be finite or non-finite. This applies to either push and pull transfers. Finite data is data that is defined by a finite set, for example, machine learning data or images. After finite data transmission has finished, the TP is completed. Non-finite data is data that is defined by an infinite set or has no specified end, for example, streams or an API endpoint. With non-finite data, a TP will continue indefinitely until either the Consumer or Provider explicitly terminates the transmission." Provider-PUSH transfers transition the Transfer Processes and Data Flow to final states when the process is completed, and therefore do not allow infinite data to be exchanged.
Workarounds
Let's look at the existing workarounds to deal with infinite data by applying them against the following example:
Company A wants to consume from Company B, with Company B sending a discrete set of data every Monday. This represents a Provider-PUSH transfer (company B is the Provider and wants to push data to company A) that deals with infinite data (the data is available every Monday, so no determined endpoint).
Use the Consumer-PULL flow
The Consumer-PULL flow allows infinite data to be transferred because EDRs remain valid across multiple data transfers, enabling Company A to request data from Company B every Monday. While this approach is suitable for many use cases, some scenarios require the Provider-PUSH flow due to its asynchronous nature, overcoming the limitations of synchronous protocols such as data size and latency. The Consumer-PULL flow is also impractical when there are multiple Providers (e.g., Company B, C, D, etc.) because company A would need to request data from each Provider individually. With the Provider-PUSH, the Providers could push their data without Company A requesting it, simplifying the process.
Transfer Data Chunks Individually
To create the illusion that infinite data is actually finite, we may treat each data chunk individually and transfer them using the existing Provider-PUSH mechanism. The issue with this approach is that the Consumer must request each chunk every Monday, creating a new Transfer Process in the process. This increases the overall complexity, as there is a recurrent step which must always be performed to ensure the correct data exchange, and also adds computing costs, as each new Transfer Process must process multiple states throughout its lifecycle.
Invert The Participants Roles
We could also enable infinite data to be transferred by reversing the participants' roles and leveraging the Consumer-PULL capabilities. From the Connector's perspective, lets treat Company A as the Data Provider and Company B as the Data Consumer, despite Company B holding the desired data. Company A creates an asset with a data address pointing to where the data must be stored, for example an API endpoint expecting a body from an HTTP request, and offers it. Company B negotiates it and starts a Consumer-PULL transfer. If company B adds the desired data in the body of the HTTP request it performs to company A's dataplane, this data may be proxied to the data address of the asset, and company A will be able to store it. Although this only works for HTTP data addresses natively, the dataplane may easily be extended to support any desired technology. In actuality, this is an anti-pattern and should not be used. Since the Connector applies the usage policies on company B instead of company A, data sovereignty is not ensured.
Proposed Approach
Analyzing the workarounds reveals that the main obstacle to transferring infinite data is that the communication channel between the Connectors is closed after the initial data transfer. If Transfer Processes remain active after exhanging data, they may be reused, allowing the Provider to trigger new transfers as new data becomes available. The following image illustrates the expected behavior for pushing infinite data:
Note that closing Transfer Processes can be achieved by the existing terminating mechanism, so no new development is needed.
UC1 - Identify infinite Assets
As a Provider, I want to identity my Assets that provide infinite data, so that their transfer doesn't automatically finalize the Transfer Process.
Asset
Asset
s already can receive any user-typed String propertyTransferProcess
creation unchangedDataFlowStartMessage
DataFlowStartMessage
creation inDataPlaneSignalingFlowController
JsonObjectFromDataFlowStartMessageTransformer
andJsonObjectToDataFlowStartMessageTransformer
will be updated accordinglyDataFlow
DataFlow
should be completed / terminated, or kept open after the data transferDataFlowStartMessage
's "keepAlive", added atDataFlow
creation inDataPlaneManagerImpl
dataplane-schema.sql
must be updated to create a new column when using the SQL storeDataFlowStates
DataFlow
s that performed a successful data transfer and are awaiting a trigger to start another data transferDataPlaneManagerImpl
will be updated so that if the transfer was successful and "keepAlive" is true, theDataFlow
transitions to AWAITINGTransferProcess
via the Control API, it will remain STARTEDDataFlow
andTransferProcess
es will be terminated, populating the error details and giving visibility to what went wrongUC2 - Trigger a Transfer Process
As a Provider, I want to trigger the transfer of new data to Consumers using an active Transfer Process, so that partners may receive updates from my data source.
TransferProcessApiV3
/TransferProcessApiV3Controller
to trigger transfer processesTriggerTransferCommand
, receiving the transfer process idTransferProcessService
/TransferProcessServiceImpl
TriggerTransferCommand
and return aServiceResult<Void>
TriggerTransferCommandHandler
TransferProcessCommandExtension
TransferProcess
must be of PROVIDER typeTransferProcess
must be of PUSH flow typeTransferProcess
must be in STARTED stateAsset
must be infiniteTransferProcess
is transitioned to RESUMINGDataFlow
, creating another data transfer, which is the desired logicDataFlow
updates the existing one, moving it from AWAITING state to RECEIVEDFuture Work
I see that there are some very interesting future topic that may derive from this initial approach on infinite data.
Nevertheless, I think it would make sense to provide an initial discussion on this issue, implement a basis for this concept, and then improve it step-by-step. I'm happy to hear your thoughts on this.
Beta Was this translation helpful? Give feedback.
All reactions