Nested Record Processing #41
snork-alt
started this conversation in
Feature Requests
Replies: 2 comments
-
|
@v3g42 @duonganhthu43 I felt using the WHERE condition between child and parent was getting very complex to implement. See what you think of this, which is easier to implement. Later we could also consider implementing the WHERE |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
@snork-alt We still will filter by the foreign key value I am guessing ? Would this syntax be more appropriate ? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Let's consider the following example source tables:
users
users_contact
We want to populate the cache with an object containing the following information:
username, fn, ln, countryfromuserstabletype, contactfromusers_contacttableThe final object will look like this:
Query definition and schema registration
In order to obtain the above object, the user should define a new endpoint specifying the following query:
Two new SQL keywords are defined by DOZER SQL:
NESTED_ARR: Allows the user to specify a subquery that will be represented as an array of records in the cacheINDEX BY: Allows the user to specify what field is used for indexing in the cache. In the example above the parent object usesidas a primary key. For the nested query, however, we are using two fields:user_idandid.user_idis used for creating the relationship between the nested object and the parent object. In the source tablesuser_idis a foreign key creating in a relationship between theuser_contactstable and theuserstable.idrepresents the primary key of theusers_contacttableUpon definition of the query Dozer will register a new schema that will look like the following, assigning a new unique id to it
Pipeline
The pipeline will take care of processing any CDC event produced by the two tables above, processing the query and generating
Operationobjects following the schema above. Let's consider an example:New record added in
user_contactsIn the following example, a new email contact is being added for the user john123. Here is the Operation message produced at the end of the pipeline, to be consumed by the cache writer.
In the above examples, only the fields from the
users_contacttable are propagated. The only field propagated from the parent table is theid, which is needed for the correct indexing in the cache. All the field values are positional and matching to the schema previously registered. Theindexfield in the schema definition defines the offset of each fieldCache population
Each nested record in the cache is mapped to a specific key/value entry. For the object provided above the following is the physical cache representation
/u1{username: "john123", fn: "John", ln: "Smith", country: "SG"}/u1/4:c1{type: "phone", contact: "+65 776536 "}/u1/4:c2{type: "email", contact: " [email protected]"}/u1/4:c4{type: "email", contact: "[email protected]"}For the parent object, the key is simply represented by the value of the field indicated in the
INDEX BYkeyword in parent query. For the inner object, the key is represented by the values of the two fields indicated in the 'INDEX BY' keyword in in the nested query. Note that, for every inner object, the value is prefixed with the index of the field it is referring to (4in this schema, as it it shown in the schema definition)Beta Was this translation helpful? Give feedback.
All reactions