Auto-pull Data Algorithm #143
Replies: 10 comments
-
EventId="3 components"
|
Beta Was this translation helpful? Give feedback.
-
A1: export async function jsonToMd5(
json: Record<string, unknown>,
): Promise<string> {
// Convert JSON object to string
const jsonString = JSON.stringify(json);
// Convert json stringto Uint8Array
const encoder = new TextEncoder();
const data = encoder.encode(jsonString);
// Calcualte MD5
const hashBuffer = await crypto.subtle.digest("MD5", data);
// Convert HASH value to Hex string
const hashArray = Array.from(new Uint8Array(hashBuffer));
return hashArray.map((b) => b.toString(16).padStart(2, "0")).join("");
} A2: |
Beta Was this translation helpful? Give feedback.
-
Encountered issues
Todo: Debug why we have different hash for events that should be the same (from FDX) |
Beta Was this translation helpful? Give feedback.
-
Fdx's data is indeed different from what we expected. The fact that the new data contains fewer records than the old data is another strange issue. see: #119 (comment). I guess the Fdx system is constantly organizing its data, but we're not sure — we don't know what kind of problems they are encountering. |
Beta Was this translation helpful? Give feedback.
-
To understand why data is deleted during the auto-pull process, we need to analyze both the newly added data and the deleted data. To do this, we need to record the deleted source data. My approach is to set the This way, the deleted data can be retained without affecting front-end queries. |
Beta Was this translation helpful? Give feedback.
-
Let's rethink the logic to compare Set A and Set B
Other Ideas
|
Beta Was this translation helpful? Give feedback.
-
Two data issues (observed and potential)
Design guideline
Side effects
Lessons learned
|
Beta Was this translation helpful? Give feedback.
-
Issue:The auto-pull process frequently deletes database records. Analysis:At the outset of the design, we assumed:
Through detailed analysis of FedEx data, we confirmed that FedEx frequently makes minor adjustments to historical data, causing slightly different data to generate a new After discovering that the number of records in the database exceeds that of the data source, our approach was to delete data from the database that no longer exists in the data source. Due to FedEx’s frequent data changes, we correspondingly end up frequently deleting data from the database. Data change exampleThe following is the data pulled for fdx-881600917035 at different time points: {
"date": "2025-05-29T01:22:00-07:00",
"eventType": "AR",
"locationId": "OAKH",
"locationType": "FEDEX_FACILITY",
"scanLocation": {
"city": "OAKLAND",
"postalCode": "94621",
"countryCode": "US",
"countryName": "United States",
"residential": false,
"streetLines": {
},
"stateOrProvinceCode": "CA"
},
"derivedStatus": "In transit",
"exceptionCode": "",
"eventDescription": "Arrived at FedEx hub",
"derivedStatusCode": "IT",
"exceptionDescription": ""
} data pulled at: 2025-05-30T00:55:01.945Z {
"date": "2025-05-29T01:22:00-07:00",
"eventType": "AR",
"locationType": "FEDEX_FACILITY",
"scanLocation": {
"city": "OAKLAND",
"postalCode": "94621",
"countryCode": "US",
"countryName": "United States",
"residential": false,
"streetLines": {
},
"stateOrProvinceCode": "CA"
},
"derivedStatus": "In transit",
"exceptionCode": "",
"eventDescription": "Arrived at FedEx hub",
"derivedStatusCode": "IT",
"exceptionDescription": ""
} The difference between these two JSON data is: the latter does not include the Objective Review:The primary goal of this product is to standardize logistics data and provide developers with an easy-to-use API. Specifically, it tracks key events during logistics transportation through status codes. Therefore, we do not present the data source’s data as-is: we add status codes to the data, and similarly, we remove data that is meaningless from the perspective of our objectives. In other words, we aim to:
Algorithm Change:Based on the above objectives, we cannot simply convert source data into standardized data on a one-to-one basis. For key information, we need to perform standardized transformation before adding it to the database. For invalid data, we need to remove it from the database. Thus, the updated
In simple terms, for each |
Beta Was this translation helpful? Give feedback.
-
Algorithm for events updateFor a trackingIDSet (A): Fresh event-related data pulled from the source. Goal: find the simplist and most efficient way to do both:a) Update: (B) = (A); Outcome: (B) <= (A)Steps:
|
Beta Was this translation helpful? Give feedback.
-
Assume that we have two variables available:
in-memory operationsThe following is the logic to check whether the event data has changed.
db-related operations
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Main logic
Pulling data from the logistics source every five minutes.
(1) Get incomplete trackingIDs (not delivered)
Note: The trackingID that is marked Completed if its status is 3500 or 3007.
Retrieve ongoing trackingIDs and additional information (e.g., phone number) from the
entities
table. Include all ongoing trackingIDs wherecompleted = false
.fdx-881463876410
(2) Iterate through each trackingID
2.1) Based on the operator code, send a request to the data provider (e.g., SF Express or FedEx), receive the response, and convert it to our standard Entity object.
During the process of converting the JSON response to an Entity object, we navigate to the scanEvents section (for FDX) or the routes section (for SFEX), where the data provider stores the list of events. We iterate through each event item in the list, convert it into the Event object we defined, and set the eventId for each Event object. Before appending the newly created Event object to the Entity object, we check whether an event with the same eventId already exists in the Entity. If an event with the same eventId already exists, the newly created Event will be ignored.
2.2) Retrieve existing eventIds from the database using the trackingID.
ev_fdx-881463876410-0cb79ccf0ea0a1906434f88994f6ad45
Compare the eventIds in the returned Entity object with the existing eventIds in the database to determine if the Entity object’s events have been updated.
2.4) If updated, proceed to update the event records in the database.
Logic to Compare Two Sets of Event IDs
We have two sets of eventIds:
providerEventIds
dbEventIds
If the sizes (count of the eventIds) of these two sets are different, goto Section 3.
If the sizes are the same, we need to compare the eventIds further.
It’s important to note that the eventIds may change due to the data received for the event block from the source, which could cause the hash values to differ.
If both sets have the exact eventIDs, then do nothing. Else goto Section 3.
(3) Update events by comparing 2 sets
3.1) If in Set A but not in B, it’s a new event to add to the database.
3.2) If in Set B but not in A, delete it from the database.
Update Logic
The
updateEntity()
function accepts two parameters:Parameter 1:
entity.events
: entity object with events, retrieved from the data provider. Set AParameter 2:
eventIds[]
, an array containing the saved eventIds. Set BStep 1: Update
entities
Mark the trackingID as Completed if its status is 3500 or 3007.
Step 2: Update
events
in 2 loopsThere are two loops to compare eventIds from sets A and B.
Loop 1: Insert new events
Iterate through event IDs in set A (in the newly pulled data).
If an eventId not found in set B, add the event to the database.
Loop 2: Delete events
Iterate through event IDs in set B (in database already).
If an eventId not found in set A, delete it from the database.
Beta Was this translation helpful? Give feedback.
All reactions