Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
0d186e6
Refactor dataset related code
haiqi96 Jun 20, 2025
75ac0ff
further refactor
haiqi96 Jun 20, 2025
bb1e5f4
Linter
haiqi96 Jun 20, 2025
ba7cfe1
A few more fixes
haiqi96 Jun 20, 2025
68454c6
Linter fixes
haiqi96 Jun 20, 2025
c1de746
missing fixes
haiqi96 Jun 20, 2025
d797198
Fix mistake
haiqi96 Jun 20, 2025
8c39e77
actually fixing
haiqi96 Jun 20, 2025
d570ab6
Linter again
haiqi96 Jun 20, 2025
398ab5e
Merge branch 'main' into DatasetRefactor
haiqi96 Jun 25, 2025
7759a7a
Merge remote-tracking branch 'origin/main' into DatasetRefactor
haiqi96 Jun 27, 2025
e6b8cc7
Linter
haiqi96 Jun 27, 2025
7a468c3
Merge branch 'main' into DatasetRefactor
Bill-hbrhbr Jun 29, 2025
1dd1cea
Move default dataset metadata table creation to start_clp
Bill-hbrhbr Jun 29, 2025
a0c3c29
Remove unused import
Bill-hbrhbr Jun 29, 2025
a9bf615
Address review comments
Bill-hbrhbr Jun 30, 2025
fe05f5f
Replace the missing SUFFIX
Bill-hbrhbr Jun 30, 2025
39a9278
Move suffix constants from clp_config to clp_metadata_db_utils local …
Bill-hbrhbr Jun 30, 2025
7124828
Refactor archive_manager.py.
kirkrodrigues Jun 30, 2025
eb80992
Refactor s3_utils.py.
kirkrodrigues Jun 30, 2025
5ed44e7
compression_task.py: Fix typing errors and minor refactoring.
kirkrodrigues Jun 30, 2025
af6b508
compression_scheduler.py: Remove exception swallow which will hide un…
kirkrodrigues Jun 30, 2025
67fb01f
Refactor query_scheduler.py.
kirkrodrigues Jun 30, 2025
d6ad4de
clp_metadata_db_utils.py: Minor refactoring.
kirkrodrigues Jun 30, 2025
ff7d700
clp_metadata_db_utils.py: Rename _generic_get_table_name -> _get_tabl…
kirkrodrigues Jun 30, 2025
7ffc77c
clp_metadata_db_utils.py: Alphabetize new public functions.
kirkrodrigues Jun 30, 2025
0255cbd
clp_metadata_db_utils.py: Reorder public and private functions for co…
kirkrodrigues Jun 30, 2025
1076a3f
initialize-clp-metadata-db.py: Remove changes unrelated to PR.
kirkrodrigues Jun 30, 2025
71c4d82
Move default dataset creation into compression_scheduler so that it r…
kirkrodrigues Jun 30, 2025
6bd9372
Apply suggestions from code review
kirkrodrigues Jul 1, 2025
84df2e2
Merge branch 'main' into DatasetRefactor
kirkrodrigues Jul 1, 2025
983bea1
Remove bug fix that's no longer necessary.
kirkrodrigues Jul 1, 2025
bdb7817
Fix bug where dataset has a default value instead of None when using …
Bill-hbrhbr Jul 1, 2025
a82a267
Correctly feed in the input config dataset names
Bill-hbrhbr Jul 1, 2025
f699496
Remove unnecessary changes
Bill-hbrhbr Jul 1, 2025
90ce0a4
Update the webui to pass the dataset name in the clp-json code path (…
kirkrodrigues Jul 2, 2025
d6f9e5a
Move dataset into the user function
haiqi96 Jul 2, 2025
dc6a706
Merge branch 'DatasetRefactor' of https://github.com/haiqi96/clp_fork…
haiqi96 Jul 2, 2025
76bcb4a
Remove unnecessary f string specifier
haiqi96 Jul 2, 2025
a4e6f83
Apply suggestions from code review
haiqi96 Jul 2, 2025
7b42568
Add import type.
kirkrodrigues Jul 2, 2025
afe43ce
Merge branch 'main' into DatasetRefactor
haiqi96 Jul 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -887,6 +887,7 @@ def start_webui(
client_settings_json_file.write(json.dumps(client_settings_json))

server_settings_json_updates = {
"ClpStorageEngine": clp_config.package.storage_engine,
"SqlDbHost": clp_config.database.host,
"SqlDbPort": clp_config.database.port,
"SqlDbName": clp_config.database.name,
Expand Down
2 changes: 2 additions & 0 deletions components/webui/server/settings.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
{
"ClpStorageEngine": "clp",

"SqlDbHost": "localhost",
"SqlDbPort": 3306,
"SqlDbName": "clp-db",
Expand Down
9 changes: 9 additions & 0 deletions components/webui/server/src/configConstants.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
// NOTE: These settings are duplicated from components/webui/client/src/config/index.ts, but will be
// removed in the near future.
const CLP_STORAGE_ENGINE_CLP_S = "clp-s";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it better to use an enum like

enum CLP_STORAGE_ENGINE {
   CLP: "clp",
   CLP_S: "clp-s",
}

then compare the values against

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better to move stuff from this file into the typings directory

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this enum in the client code. This constant is just temporary to keep the webui working in this PR. In #1050, we're going to pass the dataset from the client code, so the logic int he last two commits can be reverted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are you suggesting we duplicate the entire enum in the serve code for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right i was going to suggest reusing / duplicating this enum: https://github.com/haiqi96/clp_fork/blob/7b42568fa9d335ecbe85f13edc5d3c111b4e8587/components/webui/client/src/config/index.ts

so the logic in the last two commits can be reverted.

sure. in that case i think we can temporarily leave the constant as-is then

const CLP_DEFAULT_DATASET_NAME = "default";

export {
CLP_DEFAULT_DATASET_NAME,
CLP_STORAGE_ENGINE_CLP_S,
};
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ import {
type SearchResultsMetadataDocument,
} from "../../../../../../common/index.js";
import settings from "../../../../../settings.json" with {type: "json"};
import {
CLP_DEFAULT_DATASET_NAME,
CLP_STORAGE_ENGINE_CLP_S,
} from "../../../../configConstants.js";
import {ErrorSchema} from "../../../schemas/error.js";
import {
QueryJobCreationSchema,
Expand Down Expand Up @@ -69,6 +73,9 @@ const plugin: FastifyPluginAsyncTypebox = async (fastify) => {

const args = {
begin_timestamp: timestampBegin,
dataset: CLP_STORAGE_ENGINE_CLP_S === settings.ClpStorageEngine ?
CLP_DEFAULT_DATASET_NAME :
null,
end_timestamp: timestampEnd,
ignore_case: ignoreCase,
max_num_results: SEARCH_MAX_NUM_RESULTS,
Expand Down
8 changes: 8 additions & 0 deletions components/webui/server/src/plugins/DbManager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ import {
ResultSetHeader,
} from "mysql2/promise";

import settings from "../../settings.json";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need with {type: "json"};

import {
CLP_DEFAULT_DATASET_NAME,
CLP_STORAGE_ENGINE_CLP_S,
} from "../configConstants.js";
import {Nullable} from "../typings/common.js";
import {
DbManagerOptions,
Expand Down Expand Up @@ -125,6 +130,9 @@ class DbManager {
};
} else if (QUERY_JOB_TYPE.EXTRACT_JSON === jobType) {
jobConfig = {
dataset: CLP_STORAGE_ENGINE_CLP_S === settings.ClpStorageEngine ?
CLP_DEFAULT_DATASET_NAME :
null,
archive_id: streamId,
target_chunk_size: targetUncompressedSize,
};
Comment on lines 132 to 138
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Avoid sending a dataset: null field when it’s not needed

Down-stream scheduler components typically look for the dataset key only when the storage engine is CLP_S. Serialising the field with a null value adds noise and may confuse strict schema validators.

-            jobConfig = {
-                dataset: CLP_STORAGE_ENGINE_CLP_S === settings.ClpStorageEngine ?
-                    CLP_DEFAULT_DATASET_NAME :
-                    null,
-                archive_id: streamId,
-                target_chunk_size: targetUncompressedSize,
-            };
+            jobConfig = {
+                archive_id: streamId,
+                target_chunk_size: targetUncompressedSize,
+                ...(CLP_STORAGE_ENGINE_CLP_S === settings.ClpStorageEngine && {
+                    dataset: CLP_DEFAULT_DATASET_NAME,
+                }),
+            };

This keeps the payload minimal and makes future schema evolution easier.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
jobConfig = {
dataset: CLP_STORAGE_ENGINE_CLP_S === settings.ClpStorageEngine ?
CLP_DEFAULT_DATASET_NAME :
null,
archive_id: streamId,
target_chunk_size: targetUncompressedSize,
};
jobConfig = {
archive_id: streamId,
target_chunk_size: targetUncompressedSize,
...(CLP_STORAGE_ENGINE_CLP_S === settings.ClpStorageEngine && {
dataset: CLP_DEFAULT_DATASET_NAME,
}),
};
🤖 Prompt for AI Agents
In components/webui/server/src/plugins/DbManager.ts around lines 132 to 138, the
jobConfig object always includes the dataset field, even when its value is null.
To avoid sending dataset: null, modify the code to only add the dataset field to
jobConfig when settings.ClpStorageEngine equals CLP_STORAGE_ENGINE_CLP_S. This
can be done by conditionally adding the dataset property instead of assigning
null, ensuring the field is omitted entirely when not needed.

Expand Down