-
Notifications
You must be signed in to change notification settings - Fork 83
feat(clp-package): Expose dataset selection in package CLI scripts and web UI. #1050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Walkthrough
This change introduces explicit dataset selection and handling throughout the CLP system, affecting the web UI client, server API, and CLI scripts. It removes hardcoded defaults and storage-engine-based dataset logic, replacing them with explicit dataset arguments, state, and validation. New UI components and API endpoints now support dataset filtering and selection.
## Changes
| File(s) / Group | Change Summary |
|---------------------------------------------------------------------------------------------------------|---------------|
| `components/webui/client/package.json` | Added `@tanstack/react-query` and `@tanstack/react-query-devtools` as dependencies. |
| `components/webui/client/src/App.tsx` | Integrated React Query context and devtools into the app. |
| `components/webui/client/src/api/query.ts` | Refactored `submitExtractStreamJob` to use a props object including new `dataset` parameter. |
| `components/webui/client/src/api/search.ts`, `components/webui/server/src/fastify-v2/schemas/search.ts`| Added optional `dataset` property to `QueryJobCreationSchema` type/schema. |
| `components/webui/client/src/api/sql.ts` | Added new `querySql` function for SQL queries via Axios. |
| `components/webui/client/src/pages/IngestPage/Details/index.tsx`,<br>`.../Jobs/index.tsx`,<br>`.../SpaceSavings/index.tsx` | Updated import path for `querySql` to new API module. |
| `components/webui/client/src/pages/IngestPage/sqlConfig.ts` | Removed `querySql` function and its export. |
| `components/webui/client/src/pages/SearchPage/SearchControls/Dataset/DatasetLabel/index.module.css` | Added new CSS module for dataset label styling. |
| `components/webui/client/src/pages/SearchPage/SearchControls/Dataset/DatasetLabel/index.tsx` | Added new `DatasetLabel` React component. |
| `components/webui/client/src/pages/SearchPage/SearchControls/Dataset/index.module.css` | Added new CSS module for dataset selector styling. |
| `components/webui/client/src/pages/SearchPage/SearchControls/Dataset/index.tsx` | Added new `Dataset` React component for dataset selection UI. |
| `components/webui/client/src/pages/SearchPage/SearchControls/Dataset/sql.ts` | Added `fetchDatasetNames` function for retrieving datasets via SQL. |
| `components/webui/client/src/pages/SearchPage/SearchControls/SearchButton/SubmitButton/index.tsx` | Enhanced submission logic to require dataset selection for `CLP_S` storage engine. |
| `components/webui/client/src/pages/SearchPage/SearchControls/index.tsx` | Conditionally renders `Dataset` selector based on storage engine. |
| `components/webui/client/src/pages/SearchPage/SearchResults/SearchResultsTable/Message/LogViewerLink.tsx`| Appends `dataset` to log viewer links from search state. |
| `components/webui/client/src/pages/SearchPage/SearchResults/SearchResultsTimeline/index.tsx` | Handles dataset in timeline zoom and query submission for `CLP_S`. |
| `components/webui/client/src/pages/SearchPage/SearchState/index.tsx` | Extended search state with `cachedDataset` and `selectDataset` fields and update methods. |
| `components/webui/client/src/typings/query.ts` | Added `dataset` property to `ExtractJobSearchParams` type. |
| `components/webui/client/src/ui/QueryStatus.tsx` | Updated `submitExtractStreamJob` call to use new object parameter with `dataset`. |
| `components/webui/server/src/fastify-v2/routes/api/search/index.ts` | `/query` route now directly assigns `dataset` from request body. |
| `components/webui/server/src/plugins/DbManager.ts` | `submitAndWaitForExtractStreamJob` now requires explicit `dataset` parameter. |
| `components/webui/server/src/routes/query.ts` | `/query/extract-stream` POST endpoint requires `dataset` in request body. |
| `components/clp-package-utils/clp_package_utils/scripts/archive_manager.py`,<br>`.../compress.py`,<br>`.../decompress.py`,<br>`.../search.py` | Added optional `--dataset` CLI argument and logic for validation and command construction. |
| `components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py`,<br>`.../compress.py`,<br>`.../decompress.py`,<br>`.../search.py` | Added optional `--dataset` CLI argument, removed storage-engine-based defaulting, and updated internal dataset handling. |
| `components/webui/server/src/configConstants.ts` | Deleted file, removing `CLP_STORAGE_ENGINE_CLP_S` and `CLP_DEFAULT_DATASET_NAME` constants. |
| `components/clp-package-utils/clp_package_utils/general.py` | Renamed `validate_dataset` to `validate_dataset_exists`; added new `validate_dataset_name` function for dataset name validation. |
| `components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py` | Added constants `MYSQL_TABLE_NAME_MAX_LEN` and `TABLE_SUFFIX_MAX_LEN` for dataset name validation. |
| `components/clp-package-utils/clp_package_utils/scripts/start_clp.py` | Removed "ClpStorageEngine" key from server settings JSON update. |
| `components/webui/server/settings.json` | Removed "ClpStorageEngine" setting from JSON configuration. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant User
participant WebUI
participant ServerAPI
participant DbManager
User->>WebUI: Selects dataset (if CLP_S)
User->>WebUI: Submits search or extract job
WebUI->>ServerAPI: POST /query or /query/extract-stream (includes dataset)
ServerAPI->>DbManager: submitAndWaitForExtractStreamJob({dataset, ...})
DbManager-->>ServerAPI: Job result
ServerAPI-->>WebUI: Response
WebUI-->>User: Displays results/statusPossibly related PRs
Suggested reviewers
Learnt from: Bill-hbrhbr Learnt from: Bill-hbrhbr Learnt from: Bill-hbrhbr Learnt from: Bill-hbrhbr Learnt from: Bill-hbrhbr Learnt from: Bill-hbrhbr Learnt from: kirkrodrigues Learnt from: junhaoliao Learnt from: junhaoliao Learnt from: junhaoliao Learnt from: Bill-hbrhbr Learnt from: Bill-hbrhbr Learnt from: junhaoliao Learnt from: junhaoliao Learnt from: LinZhihao-723 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
components/webui/client/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (26)
components/webui/client/package.json(1 hunks)components/webui/client/src/App.tsx(2 hunks)components/webui/client/src/api/query.ts(1 hunks)components/webui/client/src/api/search.ts(1 hunks)components/webui/client/src/api/sql.ts(1 hunks)components/webui/client/src/pages/IngestPage/Details/index.tsx(1 hunks)components/webui/client/src/pages/IngestPage/Jobs/index.tsx(1 hunks)components/webui/client/src/pages/IngestPage/SpaceSavings/index.tsx(1 hunks)components/webui/client/src/pages/IngestPage/sqlConfig.ts(0 hunks)components/webui/client/src/pages/SearchPage/SearchControls/Dataset/DatasetLabel/index.module.css(1 hunks)components/webui/client/src/pages/SearchPage/SearchControls/Dataset/DatasetLabel/index.tsx(1 hunks)components/webui/client/src/pages/SearchPage/SearchControls/Dataset/index.module.css(1 hunks)components/webui/client/src/pages/SearchPage/SearchControls/Dataset/index.tsx(1 hunks)components/webui/client/src/pages/SearchPage/SearchControls/Dataset/sql.ts(1 hunks)components/webui/client/src/pages/SearchPage/SearchControls/SearchButton/SubmitButton.tsx(0 hunks)components/webui/client/src/pages/SearchPage/SearchControls/SearchButton/SubmitButton/index.tsx(1 hunks)components/webui/client/src/pages/SearchPage/SearchControls/index.tsx(2 hunks)components/webui/client/src/pages/SearchPage/SearchResults/SearchResultsTable/Message/LogViewerLink.tsx(3 hunks)components/webui/client/src/pages/SearchPage/SearchResults/SearchResultsTimeline/index.tsx(3 hunks)components/webui/client/src/pages/SearchPage/SearchState/index.tsx(6 hunks)components/webui/client/src/typings/query.ts(1 hunks)components/webui/client/src/ui/QueryStatus.tsx(2 hunks)components/webui/server/src/fastify-v2/routes/api/search/index.ts(2 hunks)components/webui/server/src/fastify-v2/schemas/search.ts(1 hunks)components/webui/server/src/plugins/DbManager.ts(2 hunks)components/webui/server/src/routes/query.ts(2 hunks)
💤 Files with no reviewable changes (2)
- components/webui/client/src/pages/IngestPage/sqlConfig.ts
- components/webui/client/src/pages/SearchPage/SearchControls/SearchButton/SubmitButton.tsx
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Preferfalse == <expression>rather than!<expression>.
⚙️ Source: CodeRabbit Configuration File
List of files the instruction was applied to:
components/webui/client/src/pages/IngestPage/SpaceSavings/index.tsxcomponents/webui/client/src/pages/IngestPage/Details/index.tsxcomponents/webui/client/src/api/search.tscomponents/webui/server/src/fastify-v2/schemas/search.tscomponents/webui/client/src/App.tsxcomponents/webui/server/src/routes/query.tscomponents/webui/client/src/typings/query.tscomponents/webui/client/src/pages/SearchPage/SearchControls/Dataset/DatasetLabel/index.tsxcomponents/webui/server/src/fastify-v2/routes/api/search/index.tscomponents/webui/client/src/ui/QueryStatus.tsxcomponents/webui/client/src/api/sql.tscomponents/webui/client/src/pages/SearchPage/SearchResults/SearchResultsTimeline/index.tsxcomponents/webui/client/src/pages/SearchPage/SearchControls/Dataset/sql.tscomponents/webui/client/src/pages/SearchPage/SearchResults/SearchResultsTable/Message/LogViewerLink.tsxcomponents/webui/client/src/pages/SearchPage/SearchControls/Dataset/index.tsxcomponents/webui/client/src/pages/IngestPage/Jobs/index.tsxcomponents/webui/client/src/pages/SearchPage/SearchControls/index.tsxcomponents/webui/server/src/plugins/DbManager.tscomponents/webui/client/src/pages/SearchPage/SearchControls/SearchButton/SubmitButton/index.tsxcomponents/webui/client/src/api/query.tscomponents/webui/client/src/pages/SearchPage/SearchState/index.tsx
🧠 Learnings (19)
components/webui/client/src/pages/IngestPage/SpaceSavings/index.tsx (1)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
components/webui/client/src/pages/IngestPage/Details/index.tsx (2)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:35-41
Timestamp: 2024-11-19T19:52:43.429Z
Learning: For internal APIs in `components/log-viewer-webui/client/src/api/query.js`, runtime validation of parameters may not be necessary since the APIs are not exposed to end users, and JsDoc type annotations may be sufficient.
components/webui/client/src/api/search.ts (1)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
components/webui/server/src/fastify-v2/schemas/search.ts (1)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
components/webui/client/package.json (3)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
Learnt from: junhaoliao
PR: y-scope/clp#939
File: components/package-template/src/etc/clp-config.yml:64-64
Timestamp: 2025-06-22T04:01:43.409Z
Learning: The new webui architecture uses Fastify with Pino logging instead of the previous Winston-based logging system that was removed during the webui refactoring.
Learnt from: davemarco
PR: y-scope/clp#797
File: components/log-viewer-webui/client/src/components/Layout/MainLayout.tsx:2-5
Timestamp: 2025-04-08T22:32:05.366Z
Learning: In this codebase using React Router v7.4.1, components should be imported directly from "react-router" (e.g., `import { Link, Outlet } from "react-router";`) rather than from "react-router-dom" as was common in previous versions of React Router.
components/webui/client/src/App.tsx (5)
Learnt from: davemarco
PR: y-scope/clp#797
File: components/log-viewer-webui/client/src/components/Layout/MainLayout.tsx:2-5
Timestamp: 2025-04-08T22:32:05.366Z
Learning: In this codebase using React Router v7.4.1, components should be imported directly from "react-router" (e.g., `import { Link, Outlet } from "react-router";`) rather than from "react-router-dom" as was common in previous versions of React Router.
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
Learnt from: davemarco
PR: y-scope/clp#797
File: components/log-viewer-webui/client/src/routes/routes.tsx:1-1
Timestamp: 2025-04-04T15:15:59.432Z
Learning: In React Router v7, `createBrowserRouter` and other router primitives should be imported from `react-router` package, not from `react-router-dom` as in previous versions. This represents a significant change in the library's structure where router primitives are now in the core package.
Learnt from: davemarco
PR: y-scope/clp#797
File: components/log-viewer-webui/client/src/components/Layout/MainLayout.tsx:2-5
Timestamp: 2025-04-08T22:32:05.367Z
Learning: React Router v7.4.1 uses a different package structure than previous versions. Components can be imported directly from "react-router" (e.g., `import { Link, Outlet } from "react-router";`) rather than from "react-router-dom" as was common in previous versions. Alternatively, DOM-specific components can also be imported using deep imports like `import { Link } from "react-router/dom";`.
Learnt from: junhaoliao
PR: y-scope/clp#937
File: components/log-viewer-webui/client/src/AntdApp.tsx:16-24
Timestamp: 2025-05-29T20:33:40.653Z
Learning: In components/log-viewer-webui React codebase: Return type annotations (like `: JSX.Element`) are unnecessary and not preferred for React components in JSX/TSX files.
components/webui/server/src/routes/query.ts (1)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
components/webui/client/src/typings/query.ts (1)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
components/webui/client/src/pages/SearchPage/SearchControls/Dataset/DatasetLabel/index.tsx (1)
Learnt from: junhaoliao
PR: y-scope/clp#937
File: components/log-viewer-webui/client/src/AntdApp.tsx:16-24
Timestamp: 2025-05-29T20:33:40.653Z
Learning: In components/log-viewer-webui React codebase: Return type annotations (like `: JSX.Element`) are unnecessary and not preferred for React components in JSX/TSX files.
components/webui/server/src/fastify-v2/routes/api/search/index.ts (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
components/webui/client/src/ui/QueryStatus.tsx (1)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
components/webui/client/src/api/sql.ts (1)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
components/webui/client/src/pages/SearchPage/SearchResults/SearchResultsTimeline/index.tsx (4)
Learnt from: junhaoliao
PR: y-scope/clp#962
File: components/log-viewer-webui/client/src/pages/SearchPage/SearchQueryStatus/Results.tsx:22-26
Timestamp: 2025-06-09T17:15:30.309Z
Learning: When using Zustand stores in React components, always use specific selectors (e.g., `useStore(state => state.property)`) instead of destructuring the entire state object to prevent unnecessary re-renders. This is especially important as state stores grow in complexity.
Learnt from: junhaoliao
PR: y-scope/clp#962
File: components/log-viewer-webui/client/src/pages/SearchPage/SearchQueryStatus/Results.tsx:22-26
Timestamp: 2025-06-09T17:15:30.309Z
Learning: When using Zustand stores in React components, always use specific selectors (e.g., `useStore(state => state.property)`) instead of destructuring the entire state object to prevent unnecessary re-renders. This is especially important as state stores grow in complexity.
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
Learnt from: junhaoliao
PR: y-scope/clp#937
File: components/log-viewer-webui/client/src/AntdApp.tsx:16-24
Timestamp: 2025-05-29T20:33:40.653Z
Learning: In components/log-viewer-webui React codebase: Return type annotations (like `: JSX.Element`) are unnecessary and not preferred for React components in JSX/TSX files.
components/webui/client/src/pages/SearchPage/SearchResults/SearchResultsTable/Message/LogViewerLink.tsx (6)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
Learnt from: davemarco
PR: y-scope/clp#797
File: components/log-viewer-webui/client/src/components/Layout/MainLayout.tsx:2-5
Timestamp: 2025-04-08T22:32:05.366Z
Learning: In this codebase using React Router v7.4.1, components should be imported directly from "react-router" (e.g., `import { Link, Outlet } from "react-router";`) rather than from "react-router-dom" as was common in previous versions of React Router.
Learnt from: junhaoliao
PR: y-scope/clp#937
File: components/log-viewer-webui/client/src/AntdApp.tsx:16-24
Timestamp: 2025-05-29T20:33:40.653Z
Learning: In components/log-viewer-webui React codebase: Return type annotations (like `: JSX.Element`) are unnecessary and not preferred for React components in JSX/TSX files.
Learnt from: davemarco
PR: y-scope/clp#797
File: components/log-viewer-webui/client/src/components/Layout/MainLayout.tsx:2-5
Timestamp: 2025-04-08T22:32:05.367Z
Learning: React Router v7.4.1 uses a different package structure than previous versions. Components can be imported directly from "react-router" (e.g., `import { Link, Outlet } from "react-router";`) rather than from "react-router-dom" as was common in previous versions. Alternatively, DOM-specific components can also be imported using deep imports like `import { Link } from "react-router/dom";`.
Learnt from: junhaoliao
PR: y-scope/clp#962
File: components/log-viewer-webui/client/src/pages/SearchPage/SearchQueryStatus/Results.tsx:22-26
Timestamp: 2025-06-09T17:15:30.309Z
Learning: When using Zustand stores in React components, always use specific selectors (e.g., `useStore(state => state.property)`) instead of destructuring the entire state object to prevent unnecessary re-renders. This is especially important as state stores grow in complexity.
Learnt from: junhaoliao
PR: y-scope/clp#962
File: components/log-viewer-webui/client/src/pages/SearchPage/SearchQueryStatus/Results.tsx:22-26
Timestamp: 2025-06-09T17:15:30.309Z
Learning: When using Zustand stores in React components, always use specific selectors (e.g., `useStore(state => state.property)`) instead of destructuring the entire state object to prevent unnecessary re-renders. This is especially important as state stores grow in complexity.
components/webui/client/src/pages/SearchPage/SearchControls/Dataset/index.tsx (3)
Learnt from: junhaoliao
PR: y-scope/clp#937
File: components/log-viewer-webui/client/src/AntdApp.tsx:16-24
Timestamp: 2025-05-29T20:33:40.653Z
Learning: In components/log-viewer-webui React codebase: Return type annotations (like `: JSX.Element`) are unnecessary and not preferred for React components in JSX/TSX files.
Learnt from: junhaoliao
PR: y-scope/clp#962
File: components/log-viewer-webui/client/src/pages/SearchPage/SearchQueryStatus/Results.tsx:22-26
Timestamp: 2025-06-09T17:15:30.309Z
Learning: When using Zustand stores in React components, always use specific selectors (e.g., `useStore(state => state.property)`) instead of destructuring the entire state object to prevent unnecessary re-renders. This is especially important as state stores grow in complexity.
Learnt from: junhaoliao
PR: y-scope/clp#962
File: components/log-viewer-webui/client/src/pages/SearchPage/SearchQueryStatus/Results.tsx:22-26
Timestamp: 2025-06-09T17:15:30.309Z
Learning: When using Zustand stores in React components, always use specific selectors (e.g., `useStore(state => state.property)`) instead of destructuring the entire state object to prevent unnecessary re-renders. This is especially important as state stores grow in complexity.
components/webui/client/src/pages/IngestPage/Jobs/index.tsx (1)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
components/webui/server/src/plugins/DbManager.ts (1)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
components/webui/client/src/pages/SearchPage/SearchControls/SearchButton/SubmitButton/index.tsx (1)
Learnt from: junhaoliao
PR: y-scope/clp#937
File: components/log-viewer-webui/client/src/AntdApp.tsx:16-24
Timestamp: 2025-05-29T20:33:40.653Z
Learning: In components/log-viewer-webui React codebase: Return type annotations (like `: JSX.Element`) are unnecessary and not preferred for React components in JSX/TSX files.
components/webui/client/src/api/query.ts (2)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:35-41
Timestamp: 2024-11-19T19:52:43.429Z
Learning: For internal APIs in `components/log-viewer-webui/client/src/api/query.js`, runtime validation of parameters may not be necessary since the APIs are not exposed to end users, and JsDoc type annotations may be sufficient.
🧬 Code Graph Analysis (5)
components/webui/client/src/ui/QueryStatus.tsx (2)
components/webui/client/src/api/query.ts (1)
submitExtractStreamJob(51-51)components/webui/client/src/typings/query.ts (2)
EXTRACT_JOB_TYPE(92-92)QUERY_LOADING_STATE(95-95)
components/webui/client/src/pages/SearchPage/SearchControls/Dataset/sql.ts (1)
components/webui/client/src/api/sql.ts (1)
querySql(14-14)
components/webui/client/src/pages/SearchPage/SearchControls/Dataset/index.tsx (1)
components/webui/client/src/pages/SearchPage/SearchControls/Dataset/sql.ts (1)
fetchDatasetNames(41-41)
components/webui/client/src/pages/SearchPage/SearchControls/SearchButton/SubmitButton/index.tsx (1)
components/webui/client/src/pages/SearchPage/SearchState/index.tsx (1)
SEARCH_STATE_DEFAULT(152-152)
components/webui/client/src/api/query.ts (1)
components/webui/client/src/typings/query.ts (2)
QUERY_JOB_TYPE(94-94)ExtractStreamResp(90-90)
🔇 Additional comments (38)
components/webui/client/src/pages/IngestPage/SpaceSavings/index.tsx (1)
9-9: LGTM! Import path correctly updated.The import path change aligns with the centralization of SQL query logic into the shared API module.
components/webui/client/src/pages/IngestPage/Jobs/index.tsx (1)
9-9: LGTM! Import path correctly updated.The import path change is consistent with the centralization of SQL query logic and follows the same pattern as other IngestPage components.
components/webui/client/src/pages/IngestPage/Details/index.tsx (1)
10-10: LGTM! Import path correctly updated.The import path change completes the consistent refactor across IngestPage components to use the centralized SQL query API.
components/webui/client/src/pages/SearchPage/SearchControls/Dataset/DatasetLabel/index.module.css (1)
1-9: LGTM! Clean CSS styling for dataset label component.The flexbox layout with centre alignment and border radius adjustments is appropriate for a label that visually connects to an adjacent selector element.
components/webui/server/src/fastify-v2/schemas/search.ts (1)
13-13: LGTM! Clean addition of required dataset parameter.The schema extension is consistent with the PR objectives and properly typed using TypeBox's string validation.
components/webui/client/src/api/search.ts (1)
18-18: LGTM! Consistent type definition aligns with server schema.The addition of the required
datasetstring property maintains type consistency between client and server schemas.components/webui/client/src/pages/SearchPage/SearchControls/Dataset/index.module.css (1)
1-13: LGTM! Appropriate styling for dataset selector component.The flexbox layout and width constraints provide consistent UI sizing. The use of
!importantfor Ant Design component overrides is justified when customising third-party component libraries.components/webui/client/src/typings/query.ts (1)
52-52: LGTM! Consistent schema extension for extract job parameters.The addition of the required
datasetparameter using TypeBox's Type.String() maintains consistency with existing schema patterns and ensures type safety for extract operations.components/webui/server/src/routes/query.ts (3)
41-41: LGTM! Dataset parameter correctly added to schema.The dataset parameter is properly defined as a required string in the request body schema.
48-48: LGTM! Dataset parameter correctly extracted from request body.The destructuring follows the established pattern and correctly extracts the dataset field.
61-61: LGTM! Dataset parameter correctly passed to job submission.The dataset is properly included in the parameters passed to
submitAndWaitForExtractStreamJob.components/webui/client/src/App.tsx (3)
3-7: LGTM! React Query imports are correct.The imports from
@tanstack/react-queryand@tanstack/react-query-devtoolsare properly structured.
16-16: LGTM! QueryClient instance created correctly.The QueryClient is instantiated with default configuration, which is appropriate for most use cases.
25-32: LGTM! React Query provider integration is correct.The QueryClientProvider properly wraps the existing providers and the ReactQueryDevtools is appropriately configured with
initialIsOpen={false}.components/webui/client/src/pages/SearchPage/SearchControls/Dataset/DatasetLabel/index.tsx (1)
1-35: LGTM! Component implementation follows established patterns.The DatasetLabel component correctly uses Ant Design theming and Typography. The combination of CSS modules and inline styles allows for both static styling and dynamic theming based on design tokens, which is appropriate for this use case.
components/webui/client/src/pages/SearchPage/SearchResults/SearchResultsTable/Message/LogViewerLink.tsx (3)
11-11: LGTM! Correct import of search store.The useSearchStore import is properly added for accessing the cached dataset state.
36-36: LGTM! Proper use of specific selector.Using
useSearchStore((state) => state.cachedDataset)follows the learned pattern of using specific selectors to prevent unnecessary re-renders.
50-51: LGTM! Dataset parameter correctly added to URL.The dataset parameter is properly encoded and appended to the query string, maintaining consistency with existing URL parameters.
components/webui/server/src/fastify-v2/routes/api/search/index.ts (2)
63-63: LGTM! Dataset parameter correctly destructured from request body.The dataset field is properly extracted alongside other search parameters following the established pattern.
73-73: LGTM! Dataset parameter correctly included in job arguments.The dataset is properly added to the args object that will be passed to the search and aggregation job submission functions.
components/webui/client/src/ui/QueryStatus.tsx (1)
68-79: LGTM! Improved API usage with named parameters.The refactoring from positional arguments to a single object with named properties improves code readability and maintainability. The addition of the
datasetparameter correctly integrates with the broader dataset awareness functionality introduced in this PR.components/webui/client/src/pages/SearchPage/SearchControls/index.tsx (2)
1-5: LGTM! Well-structured imports and component integration.The new imports for storage engine configuration and the Dataset component are properly organized and support the conditional rendering logic.
30-30: LGTM! Appropriate conditional rendering.The conditional rendering of the Dataset component based on the storage engine type ensures the dataset selector is only shown when using CLP_S, which aligns with the PR objectives for exposing dataset functionality.
components/webui/client/src/pages/SearchPage/SearchControls/Dataset/sql.ts (4)
7-7: LGTM! Good use of constants for table names.Using a constant for the table name prevents typos and makes the code more maintainable.
13-15: LGTM! Proper enum usage for column names.The enum provides type safety and prevents magic strings while maintaining clarity about the table structure.
20-25: LGTM! Clean and well-structured SQL query.The query is properly formatted, uses the defined constants, and includes sensible ordering by name for better user experience.
36-39: LGTM! Robust async function implementation.The function properly uses the typed querySql API and correctly extracts the dataset names from the response. The return type is appropriately defined as
Promise<string[]>.components/webui/client/src/pages/SearchPage/SearchResults/SearchResultsTimeline/index.tsx (2)
26-36: LGTM! Excellent refactoring to use specific selectors.The change from destructuring the entire state to using individual selectors follows the established learning from previous PRs and will prevent unnecessary re-renders as the state store grows in complexity.
76-76: LGTM! Correct dataset parameter integration.The dataset parameter is properly included in the query submission, with appropriate fallback to an empty string when no dataset is selected.
components/webui/server/src/plugins/DbManager.ts (2)
101-101: LGTM! Proper method signature and documentation updates.The addition of the
datasetparameter to the method signature and documentation correctly reflects the new functionality for dataset-aware extract stream jobs.Also applies to: 109-109, 115-115
131-131: LGTM! Correct conditional dataset integration.The dataset is appropriately added to the
jobConfigonly forEXTRACT_JSONjob types, which aligns with the conditional logic and maintains backwards compatibility for other job types.components/webui/client/src/pages/SearchPage/SearchControls/Dataset/index.tsx (1)
21-96: Component implementation looks good overall.The component follows React best practices with proper use of React Query, Zustand state management, and Ant Design components. The error handling, loading states, and side effects are well managed.
components/webui/client/src/pages/SearchPage/SearchControls/SearchButton/SubmitButton/index.tsx (3)
25-94: Component follows good React patterns and state management practices.The component correctly uses specific Zustand selectors as recommended in the learnings, implements proper validation logic for dataset selection, and handles disabled states appropriately. The useCallback optimization is well applied.
69-70: Apply coding guidelines for boolean conditions.The null check and boolean expression should follow the coding guidelines format.
- const isNoDatasetsAndClpS = (null === selectDataset) && - (CLP_STORAGE_ENGINES.CLP_S === SETTINGS_STORAGE_ENGINE); + const isNoDatasetsAndClpS = (false == (null !== selectDataset)) && + (CLP_STORAGE_ENGINES.CLP_S === SETTINGS_STORAGE_ENGINE);⛔ Skipped due to learnings
Learnt from: junhaoliao PR: y-scope/clp#988 File: components/log-viewer-webui/client/src/components/QueryBox/InputWithCaseSensitive/CaseSenstiveToggle.tsx:34-37 Timestamp: 2025-06-09T17:48:56.024Z Learning: In the y-scope/clp project, prefer `false == <expression>` rather than `!<expression>` for boolean expressions in TypeScript/JavaScript files, as specified in the coding guidelines.Learnt from: LinZhihao-723 PR: y-scope/clp#882 File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402 Timestamp: 2025-05-07T16:56:35.687Z Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.Learnt from: Bill-hbrhbr PR: y-scope/clp#868 File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:141-144 Timestamp: 2025-05-05T16:32:55.163Z Learning: The column metadata table (created by `_create_column_metadata_table`) is only needed for dataset-specific workflows in `CLP_S` and is obsolete for non-dataset workflows.
41-49: Apply coding guidelines for boolean comparisons.According to the coding guidelines, prefer
false == <expression>rather than!<expression>. The null check should be written more explicitly.- if (CLP_STORAGE_ENGINES.CLP_S === SETTINGS_STORAGE_ENGINE) { - if (null !== selectDataset) { - updateCachedDataset(selectDataset); - } else { - console.error("Cannot submit a clp-s query without a dataset selection."); - - return; - } - } + if (CLP_STORAGE_ENGINES.CLP_S === SETTINGS_STORAGE_ENGINE) { + if (false == (null === selectDataset)) { + updateCachedDataset(selectDataset); + } else { + console.error("Cannot submit a clp-s query without a dataset selection."); + + return; + } + }⛔ Skipped due to learnings
Learnt from: junhaoliao PR: y-scope/clp#988 File: components/log-viewer-webui/client/src/components/QueryBox/InputWithCaseSensitive/CaseSenstiveToggle.tsx:34-37 Timestamp: 2025-06-09T17:48:56.024Z Learning: In the y-scope/clp project, prefer `false == <expression>` rather than `!<expression>` for boolean expressions in TypeScript/JavaScript files, as specified in the coding guidelines.Learnt from: junhaoliao PR: y-scope/clp#962 File: components/log-viewer-webui/client/src/pages/SearchPage/SearchQueryStatus/Results.tsx:22-26 Timestamp: 2025-06-09T17:15:30.309Z Learning: When using Zustand stores in React components, always use specific selectors (e.g., `useStore(state => state.property)`) instead of destructuring the entire state object to prevent unnecessary re-renders. This is especially important as state stores grow in complexity.Learnt from: junhaoliao PR: y-scope/clp#962 File: components/log-viewer-webui/client/src/pages/SearchPage/SearchQueryStatus/Results.tsx:22-26 Timestamp: 2025-06-09T17:15:30.309Z Learning: When using Zustand stores in React components, always use specific selectors (e.g., `useStore(state => state.property)`) instead of destructuring the entire state object to prevent unnecessary re-renders. This is especially important as state stores grow in complexity.Learnt from: junhaoliao PR: y-scope/clp#596 File: components/log-viewer-webui/client/src/api/query.js:16-23 Timestamp: 2024-11-21T15:51:33.203Z Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.Learnt from: junhaoliao PR: y-scope/clp#937 File: components/log-viewer-webui/client/src/AntdApp.tsx:16-24 Timestamp: 2025-05-29T20:33:40.653Z Learning: In components/log-viewer-webui React codebase: Return type annotations (like `: JSX.Element`) are unnecessary and not preferred for React components in JSX/TSX files.Learnt from: LinZhihao-723 PR: y-scope/clp#882 File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402 Timestamp: 2025-05-07T16:56:35.687Z Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.Learnt from: junhaoliao PR: y-scope/clp#596 File: components/log-viewer-webui/client/src/api/query.js:35-41 Timestamp: 2024-11-19T19:52:43.429Z Learning: For internal APIs in `components/log-viewer-webui/client/src/api/query.js`, runtime validation of parameters may not be necessary since the APIs are not exposed to end users, and JsDoc type annotations may be sufficient.Learnt from: Bill-hbrhbr PR: y-scope/clp#1036 File: components/clp-package-utils/clp_package_utils/general.py:564-579 Timestamp: 2025-06-28T07:10:47.237Z Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.components/webui/client/src/api/query.ts (1)
32-49: Function refactoring to object parameter is well implemented.The change from multiple parameters to a single object parameter improves readability and maintainability. The destructuring in the function signature and the inclusion of the dataset parameter in the request payload are correctly implemented.
components/webui/client/src/pages/SearchPage/SearchState/index.tsx (2)
38-44: Good separation of dataset state concerns.The distinction between
cachedDataset(for submitted queries/extract jobs) andselectDataset(for UI selector) is a well-designed pattern that prevents UI state changes from affecting ongoing operations.
111-149: Zustand store implementation follows best practices.The state management implementation correctly spreads the default state, provides consistent update function naming, and maintains proper immutability patterns. The TypeScript typing ensures type safety throughout the store usage.
components/webui/client/package.json
Outdated
| "zustand": "^5.0.3" | ||
| }, | ||
| "devDependencies": { | ||
| "@tanstack/react-query": "^5.81.5", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move @tanstack/react-query to dependencies.
React Query is used in production code for state management and data fetching, so it should be in dependencies rather than devDependencies. The devtools package correctly belongs in devDependencies.
Apply this diff to fix the dependency placement:
"dependencies": {
"@ant-design/v5-patch-for-react-19": "^1.0.3",
"@emotion/react": "^11.14.0",
"@emotion/styled": "^11.14.0",
"@mui/joy": "^5.0.0-beta.51",
"@sinclair/typebox": "^0.34.25",
+ "@tanstack/react-query": "^5.81.5",
"antd": "^5.24.5",
"axios": "^1.7.9",
"chart.js": "^4.4.9",
"chartjs-adapter-dayjs-4": "^1.0.4",
"chartjs-plugin-zoom": "^2.2.0",
"dayjs": "^1.11.13",
"react": "^19.0.0",
"react-chartjs-2": "^5.3.0",
"react-dom": "^19.0.0",
"react-router": "^7.4.1",
"react-syntax-highlighter": "^15.6.1",
"socket.io-client": "^4.8.1",
"zustand": "^5.0.3"
},
"devDependencies": {
- "@tanstack/react-query": "^5.81.5",
"@tanstack/react-query-devtools": "^5.81.5",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "@tanstack/react-query": "^5.81.5", | |
| "dependencies": { | |
| "@ant-design/v5-patch-for-react-19": "^1.0.3", | |
| "@emotion/react": "^11.14.0", | |
| "@emotion/styled": "^11.14.0", | |
| "@mui/joy": "^5.0.0-beta.51", | |
| "@sinclair/typebox": "^0.34.25", | |
| + "@tanstack/react-query": "^5.81.5", | |
| "antd": "^5.24.5", | |
| "axios": "^1.7.9", | |
| "chart.js": "^4.4.9", | |
| "chartjs-adapter-dayjs-4": "^1.0.4", | |
| "chartjs-plugin-zoom": "^2.2.0", | |
| "dayjs": "^1.11.13", | |
| "react": "^19.0.0", | |
| "react-chartjs-2": "^5.3.0", | |
| "react-dom": "^19.0.0", | |
| "react-router": "^7.4.1", | |
| "react-syntax-highlighter": "^15.6.1", | |
| "socket.io-client": "^4.8.1", | |
| "zustand": "^5.0.3" | |
| }, | |
| "devDependencies": { | |
| - "@tanstack/react-query": "^5.81.5", | |
| "@tanstack/react-query-devtools": "^5.81.5" | |
| } |
🤖 Prompt for AI Agents
In components/webui/client/package.json at line 36, move the
"@tanstack/react-query" package from devDependencies to dependencies because it
is used in production code for state management and data fetching. Keep the
devtools package in devDependencies as it is only for development use.
| /** | ||
| * Query the SQL server with the queryString. | ||
| * | ||
| * @param queryString | ||
| * @return | ||
| */ | ||
| const querySql = async <T>(queryString: string) => { | ||
| return axios.post<T>("/query/sql", {queryString}); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Improve JSDoc documentation.
The function implementation is solid, but the JSDoc documentation could be more descriptive about the return type and error handling.
Apply this diff to enhance the documentation:
/**
* Query the SQL server with the queryString.
*
- * @param queryString
- * @return
+ * @param queryString The SQL query string to execute
+ * @return Promise that resolves to an Axios response with typed data
+ * @throws Network errors, server errors, or invalid SQL queries
*/📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /** | |
| * Query the SQL server with the queryString. | |
| * | |
| * @param queryString | |
| * @return | |
| */ | |
| const querySql = async <T>(queryString: string) => { | |
| return axios.post<T>("/query/sql", {queryString}); | |
| }; | |
| /** | |
| * Query the SQL server with the queryString. | |
| * | |
| * @param queryString The SQL query string to execute | |
| * @return Promise that resolves to an Axios response with typed data | |
| * @throws Network errors, server errors, or invalid SQL queries | |
| */ | |
| const querySql = async <T>(queryString: string) => { | |
| return axios.post<T>("/query/sql", {queryString}); | |
| }; |
🤖 Prompt for AI Agents
In components/webui/client/src/api/sql.ts around lines 4 to 12, the JSDoc for
the querySql function lacks detail on the return type and error handling. Update
the JSDoc to specify that the function returns a Promise resolving to the
response data of type T and mention that it may throw an error if the request
fails. This will improve clarity for users of the function.
| if (CLP_STORAGE_ENGINES.CLP_S === SETTINGS_STORAGE_ENGINE) { | ||
| if (null !== selectDataset) { | ||
| updateCachedDataset(selectDataset); | ||
| } else { | ||
| console.error("Cannot submit a clp-s query without a dataset selection."); | ||
|
|
||
| return; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
LGTM! Proper dataset validation for CLP_S storage engine.
The conditional validation ensures that queries cannot be submitted without a dataset selection when using the CLP_S storage engine. The error logging provides clear feedback for debugging purposes.
Consider following the coding guideline by using false === <expression> instead of !<expression>:
- if (null !== selectDataset) {
+ if (false === (null === selectDataset)) {However, the current form is more readable, so this is a minor stylistic preference.
🤖 Prompt for AI Agents
In
components/webui/client/src/pages/SearchPage/SearchResults/SearchResultsTimeline/index.tsx
around lines 65 to 73, the condition checking if selectDataset is not null
currently uses "null !== selectDataset". To align with the coding guideline,
change this to "false === (selectDataset === null)" or an equivalent explicit
comparison instead of using negation. This is a stylistic change to improve
consistency with the codebase's preferred style.
| if (isSuccess) { | ||
| if ("undefined" !== typeof data[0] && null === dataset) { | ||
| updateDataset(data[0]); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider using standard undefined check instead of typeof comparison.
The condition "undefined" !== typeof data[0] is unnecessarily verbose. Use the more conventional data[0] !== undefined for better readability.
- if (isSuccess) {
- if ("undefined" !== typeof data[0] && null === dataset) {
- updateDataset(data[0]);
- }
- }
+ if (isSuccess) {
+ if (data[0] !== undefined && null === dataset) {
+ updateDataset(data[0]);
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (isSuccess) { | |
| if ("undefined" !== typeof data[0] && null === dataset) { | |
| updateDataset(data[0]); | |
| } | |
| } | |
| if (isSuccess) { | |
| if (data[0] !== undefined && null === dataset) { | |
| updateDataset(data[0]); | |
| } | |
| } |
🤖 Prompt for AI Agents
In components/webui/client/src/pages/SearchPage/SearchControls/Dataset/index.tsx
around lines 38 to 42, replace the condition checking if data[0] is defined from
using the verbose typeof comparison ("undefined" !== typeof data[0]) to the
simpler and more readable standard undefined check (data[0] !== undefined).
| {contextHolder} | ||
| <DatasetLabel/> | ||
| <Select | ||
| className={styles["select"] || ""} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Remove redundant fallback in className assignment.
The || "" fallback is unnecessary since styles["select"] will be undefined if the CSS module doesn't have the class, and React handles undefined className gracefully.
- className={styles["select"] || ""}
+ className={styles["select"]}🤖 Prompt for AI Agents
In components/webui/client/src/pages/SearchPage/SearchControls/Dataset/index.tsx
at line 86, remove the redundant fallback `|| ""` from the className assignment.
Change `className={styles["select"] || ""}` to just
`className={styles["select"]}` since React handles undefined className values
without issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🔭 Outside diff range comments (1)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (1)
268-268: Potential SQL injection risk with unvalidated dataset parameter.The dataset parameter is directly concatenated into the table prefix without validation, which could pose a security risk if the dataset contains malicious characters.
While the dataset should be validated before reaching this point (see previous comment), add an additional safety check:
if StorageEngine.CLP_S == storage_engine: + if dataset is None: + logger.error("Dataset is required for CLP_S storage engine.") + return -1 table_prefix = f"{table_prefix}{dataset}_"Also applies to: 339-339
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (12)
components/clp-package-utils/clp_package_utils/general.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/archive_manager.py(3 hunks)components/clp-package-utils/clp_package_utils/scripts/compress.py(6 hunks)components/clp-package-utils/clp_package_utils/scripts/decompress.py(4 hunks)components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py(8 hunks)components/clp-package-utils/clp_package_utils/scripts/native/compress.py(3 hunks)components/clp-package-utils/clp_package_utils/scripts/native/decompress.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/native/search.py(9 hunks)components/clp-package-utils/clp_package_utils/scripts/search.py(4 hunks)components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py(2 hunks)components/job-orchestration/job_orchestration/scheduler/job_config.py(3 hunks)components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py(2 hunks)
🧰 Additional context used
🧠 Learnings (12)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (3)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.237Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
components/clp-package-utils/clp_package_utils/scripts/search.py (2)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.237Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (2)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.237Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#868
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:141-144
Timestamp: 2025-05-05T16:32:55.163Z
Learning: The column metadata table (created by `_create_column_metadata_table`) is only needed for dataset-specific workflows in `CLP_S` and is obsolete for non-dataset workflows.
components/clp-package-utils/clp_package_utils/scripts/native/compress.py (2)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
components/clp-package-utils/clp_package_utils/general.py (3)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.237Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#868
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:141-144
Timestamp: 2025-05-05T16:32:55.163Z
Learning: The column metadata table (created by `_create_column_metadata_table`) is only needed for dataset-specific workflows in `CLP_S` and is obsolete for non-dataset workflows.
components/job-orchestration/job_orchestration/scheduler/job_config.py (2)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#868
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:141-144
Timestamp: 2025-05-05T16:32:55.163Z
Learning: The column metadata table (created by `_create_column_metadata_table`) is only needed for dataset-specific workflows in `CLP_S` and is obsolete for non-dataset workflows.
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (3)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.237Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
Learnt from: haiqi96
PR: y-scope/clp#569
File: components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py:380-392
Timestamp: 2024-11-15T20:07:22.256Z
Learning: The current implementation assumes single-threaded execution, so race conditions in functions like `is_target_extracted` in `components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py` are not currently a concern.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#868
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:141-144
Timestamp: 2025-05-05T16:32:55.163Z
Learning: The column metadata table (created by `_create_column_metadata_table`) is only needed for dataset-specific workflows in `CLP_S` and is obsolete for non-dataset workflows.
components/clp-package-utils/clp_package_utils/scripts/decompress.py (5)
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.237Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.
components/clp-package-utils/clp_package_utils/scripts/native/decompress.py (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
components/clp-package-utils/clp_package_utils/scripts/compress.py (3)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.237Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (5)
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.237Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#868
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:141-144
Timestamp: 2025-05-05T16:32:55.163Z
Learning: The column metadata table (created by `_create_column_metadata_table`) is only needed for dataset-specific workflows in `CLP_S` and is obsolete for non-dataset workflows.
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:90-96
Timestamp: 2024-11-15T16:22:33.635Z
Learning: In the CLP codebase, `table_prefix` is a trusted configuration value and can be safely embedded into SQL queries.
components/clp-package-utils/clp_package_utils/scripts/native/search.py (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
🧬 Code Graph Analysis (4)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
components/clp-package-utils/clp_package_utils/general.py (1)
validate_dataset(567-592)
components/clp-package-utils/clp_package_utils/scripts/search.py (1)
components/clp-package-utils/clp_package_utils/general.py (1)
validate_dataset(567-592)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (3)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (1)
validate_and_cache_dataset(160-179)components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
QueryJobConfig(59-60)components/clp-py-utils/clp_py_utils/clp_config.py (1)
StorageEngine(54-56)
components/clp-package-utils/clp_package_utils/scripts/compress.py (1)
components/clp-package-utils/clp_package_utils/general.py (1)
validate_dataset(567-592)
🪛 Flake8 (7.2.0)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
[error] 168-168: local variable 'dataset' is assigned to but never used
(F841)
components/clp-package-utils/clp_package_utils/scripts/decompress.py
[error] 220-220: undefined name 'validate_dataset'
(F821)
components/clp-package-utils/clp_package_utils/scripts/compress.py
[error] 67-67: undefined name 'Optional'
(F821)
🪛 Pylint (3.3.7)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
[warning] 168-168: Unused variable 'dataset'
(W0612)
components/clp-package-utils/clp_package_utils/general.py
[error] 15-29: Unable to import 'clp_py_utils.clp_config'
(E0401)
[error] 30-30: Unable to import 'clp_py_utils.clp_metadata_db_utils'
(E0401)
[error] 31-36: Unable to import 'clp_py_utils.core'
(E0401)
[error] 37-37: Unable to import 'clp_py_utils.sql_adapter'
(E0401)
components/job-orchestration/job_orchestration/scheduler/job_config.py
[error] 6-6: Unable to import 'clp_py_utils.clp_config'
(E0401)
[convention] 31-31: Missing class docstring
(C0115)
[refactor] 31-31: Too few public methods (0/2)
(R0903)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
[error] 44-44: Unable to import 'clp_py_utils.clp_metadata_db_utils'
(E0401)
components/clp-package-utils/clp_package_utils/scripts/decompress.py
[error] 8-13: Unable to import 'clp_py_utils.clp_config'
(E0401)
[warning] 178-181: Use lazy % formatting in logging functions
(W1203)
[warning] 185-185: Use lazy % formatting in logging functions
(W1203)
[error] 220-220: Undefined variable 'validate_dataset'
(E0602)
components/clp-package-utils/clp_package_utils/scripts/compress.py
[error] 67-67: Undefined variable 'Optional'
(E0602)
🔇 Additional comments (27)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (2)
4-4: LGTM: Enhanced typing importsGood addition of Set and Tuple for the new caching function.
160-180: Excellent caching optimization for dataset validationThis function provides an efficient approach to dataset validation by implementing caching to avoid redundant database queries. The implementation correctly:
- Checks the cache first before hitting the database
- Returns both validation status and updated cache for subsequent calls
- Follows consistent error handling patterns
- Uses proper type hints
This optimization will improve performance when multiple dataset validations are needed.
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
65-70: LGTM: Dataset argument additionThe dataset argument is properly defined with appropriate help text.
components/clp-package-utils/clp_package_utils/scripts/search.py (3)
39-44: LGTM: Dataset argument properly definedThe dataset argument follows the established pattern with appropriate type and help text.
88-88: LGTM: Proper dataset validationThe dataset validation correctly uses the imported
validate_datasetfunction to ensure dataset existence.
119-121: LGTM: Conditional dataset passingThe dataset is correctly passed to the command only when not None, maintaining backwards compatibility.
components/clp-package-utils/clp_package_utils/general.py (2)
10-10: LGTM: Required imports for dataset validationAll necessary imports are correctly added to support the new dataset validation functionality.
Also applies to: 17-17, 19-19, 25-25, 30-30, 37-37
567-592: Well-implemented dataset validation functionThis function provides comprehensive dataset validation with excellent design:
- Storage engine compatibility: Correctly restricts dataset selection to CLP_S engine only
- Default handling: Uses
CLP_DEFAULT_DATASET_NAMEwhen no dataset is specified- Resource management: Proper use of
closing()context managers for database connections- Error handling: Clear error messages for non-existent datasets and incompatible storage engines
- Integration: Leverages the new
validate_and_cache_datasetfunction for optimized validationThe function centralizes dataset validation logic and will be used consistently across CLI scripts.
components/clp-package-utils/clp_package_utils/scripts/native/compress.py (2)
195-200: LGTM: Dataset argument additionThe dataset argument is properly defined following the established pattern.
143-143: LGTM: Dataset integration in input configsThe dataset is correctly added to both
FsInputConfigandS3InputConfigobjects, enabling dataset-aware compression jobs.Note: This script doesn't perform dataset validation, which is appropriate since validation likely occurs at the job orchestration level to avoid duplication.
Also applies to: 158-158
components/clp-package-utils/clp_package_utils/scripts/native/decompress.py (1)
142-142: LGTM! Dataset parameter correctly passed from CLI.The change properly uses the dataset from command line arguments instead of a hardcoded default, aligning with the broader dataset awareness implementation.
components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
6-6: LGTM! Dataset fields properly converted to optional.The changes correctly remove the default dataset import and make dataset fields explicitly optional. This forces explicit dataset specification rather than relying on implicit defaults, which aligns with the PR objectives.
Also applies to: 25-25, 33-33, 60-60
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (2)
44-44: LGTM! Import updated to use caching validation function.The change from
fetch_existing_datasetstovalidate_and_cache_datasetimproves efficiency by caching dataset validation results.
645-663: LGTM! Dataset validation logic correctly implemented.The validation logic properly:
- Checks if dataset is not None before validation
- Uses the new caching validation function
- Updates the table prefix with dataset name
- Handles validation failures by marking jobs as failed
The implementation is consistent with the dataset awareness objectives.
components/clp-package-utils/clp_package_utils/scripts/decompress.py (3)
184-186: LGTM! Proper validation for JSON extraction storage engine.The validation correctly ensures JSON extraction is only allowed with CLP_S storage engine, which aligns with the system requirements.
220-222: Dataset validation logic is correct but missing import.The validation logic is correct, but
validate_datasetis not imported (see import comment above).
283-288: LGTM! Dataset argument properly added to JSON extraction parser.The argument definition correctly provides dataset specification for JSON extraction with an appropriate default value and help text.
components/clp-package-utils/clp_package_utils/scripts/compress.py (4)
23-23: LGTM! validate_dataset import properly added.The import correctly includes the validation function needed for dataset validation.
79-81: LGTM! Dataset argument correctly added to compression command.The conditional logic properly appends the dataset argument only when a dataset is provided.
139-144: LGTM! Dataset CLI argument properly defined.The argument definition provides appropriate dataset specification with correct type and help text.
176-176: LGTM! Dataset validation and usage correctly implemented.The dataset validation and passing to the compression command generator are properly implemented according to the new dataset awareness pattern.
Also applies to: 214-214
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (3)
99-104: LGTM! Dataset CLI argument properly configured.The
--datasetargument is correctly added with appropriate type and help text.
293-295: Path construction logic is correct and consistent.The conditional path construction properly handles both dataset-specific and default scenarios.
Also applies to: 398-398
204-204: Incorrect suggestion: dataset is already validated in archive_manager.pyThe
validate_datasetcall is present inarchive_manager.py:dataset = validate_dataset(clp_config, parsed_args.dataset)Subsequent logic uses the validated
dataset, so adding another validation onparsed_args.datasetisn’t needed.Likely an incorrect or invalid review comment.
components/clp-package-utils/clp_package_utils/scripts/native/search.py (3)
35-35: Dataset parameter integration looks correct.The dataset parameter is properly added to the function signature and correctly passed to
SearchJobConfig.Also applies to: 47-47
118-118: Consistent dataset parameter propagation throughout the call chain.The dataset parameter is correctly passed through all function calls in the search pipeline.
Also applies to: 153-153, 191-191, 206-206, 220-220
302-302: Dataset parameter correctly passed to search function.The parsed dataset argument is properly passed to the
do_searchfunction.
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Show resolved
Hide resolved
hoophalab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
webui looks good to me. There are some nitpicks. Someone needs to take a look at the package tools.
| } from "../typings/query"; | ||
|
|
||
|
|
||
| interface submitExtractStreamJobProps { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| interface submitExtractStreamJobProps { | |
| interface SubmitExtractStreamJobProps { |
| <Text | ||
| className={styles["datasetLabel"] || ""} | ||
| style={{ | ||
| backgroundColor: token.colorFillAlter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| backgroundColor: token.colorFillAlter, | |
| backgroundColor: token.colorFillTertiary, |
how about making the bg darker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, that matches the selector better. So i made darker
| {contextHolder} | ||
| <DatasetLabel/> | ||
| <Select | ||
| className={styles["select"] || ""} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if wrong: do we need || "" when importing from an index.module.css in our project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now we need it for antd components since it wont take undefined for className. Junhao mentioned that there may be a change to our typescript settings that can avoid this, but for now we need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds reasonable to me
| if (null !== selectDataset) { | ||
| updateCachedDataset(selectDataset); | ||
| } else { | ||
| console.error("Cannot submit a clp-s query without a dataset selection."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using message api to show an error message?
(feel free to push back since this branch cannot happen in normal cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per your statement this branch cannot happen in normal cases, I don't think we need message api since this branch should be impossible, so user should never see it. I just put the log there for completeness or if a developer breaks something in the future.
davemarco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed feedback
| {contextHolder} | ||
| <DatasetLabel/> | ||
| <Select | ||
| className={styles["select"] || ""} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now we need it for antd components since it wont take undefined for className. Junhao mentioned that there may be a change to our typescript settings that can avoid this, but for now we need it
| if (null !== selectDataset) { | ||
| updateCachedDataset(selectDataset); | ||
| } else { | ||
| console.error("Cannot submit a clp-s query without a dataset selection."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per your statement this branch cannot happen in normal cases, I don't think we need message api since this branch should be impossible, so user should never see it. I just put the log there for completeness or if a developer breaks something in the future.
| <Text | ||
| className={styles["datasetLabel"] || ""} | ||
| style={{ | ||
| backgroundColor: token.colorFillAlter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, that matches the selector better. So i made darker
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
components/webui/server/src/fastify-v2/routes/api/search/index.ts (1)
73-73: Fix ternary expression formatting to comply with ESLint rules.The ternary expression violates the ESLint multiline formatting rule. The logic is correct (converting empty strings to null), but the formatting needs adjustment.
- dataset: ("" === dataset ? null : dataset), + dataset: ("" === dataset + ? null + : dataset),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
components/clp-package-utils/clp_package_utils/general.py(4 hunks)components/webui/client/src/pages/SearchPage/SearchControls/Dataset/sql.ts(1 hunks)components/webui/client/src/settings.ts(1 hunks)components/webui/server/src/fastify-v2/routes/api/search/index.ts(2 hunks)components/webui/server/src/plugins/DbManager.ts(2 hunks)components/webui/server/src/routes/query.ts(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Preferfalse == <expression>rather than!<expression>.
⚙️ Source: CodeRabbit Configuration File
List of files the instruction was applied to:
components/webui/client/src/settings.tscomponents/webui/server/src/routes/query.tscomponents/webui/client/src/pages/SearchPage/SearchControls/Dataset/sql.tscomponents/webui/server/src/fastify-v2/routes/api/search/index.tscomponents/webui/server/src/plugins/DbManager.ts
🧠 Learnings (6)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.761Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.761Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:188-212
Timestamp: 2025-07-03T12:59:46.638Z
Learning: In CLP schedulers (compression and query), runtime dataset validation is required; the helper validate_and_cache_dataset keeps a local cache and only queries the DB on cache misses because dataset additions are rare.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#868
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:141-144
Timestamp: 2025-05-05T16:32:55.163Z
Learning: The column metadata table (created by `_create_column_metadata_table`) is only needed for dataset-specific workflows in `CLP_S` and is obsolete for non-dataset workflows.
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
components/webui/server/src/routes/query.ts (1)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
components/webui/client/src/pages/SearchPage/SearchControls/Dataset/sql.ts (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#868
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:141-144
Timestamp: 2025-05-05T16:32:55.163Z
Learning: The column metadata table (created by `_create_column_metadata_table`) is only needed for dataset-specific workflows in `CLP_S` and is obsolete for non-dataset workflows.
components/webui/server/src/fastify-v2/routes/api/search/index.ts (5)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
Learnt from: junhaoliao
PR: y-scope/clp#988
File: components/log-viewer-webui/client/src/components/QueryBox/InputWithCaseSensitive/CaseSenstiveToggle.tsx:34-37
Timestamp: 2025-06-09T17:48:56.024Z
Learning: In the y-scope/clp project, prefer `false == <expression>` rather than `!<expression>` for boolean expressions in TypeScript/JavaScript files, as specified in the coding guidelines.
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.
components/clp-package-utils/clp_package_utils/general.py (14)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.295Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.761Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.761Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:188-212
Timestamp: 2025-07-03T12:59:46.638Z
Learning: In CLP schedulers (compression and query), runtime dataset validation is required; the helper validate_and_cache_dataset keeps a local cache and only queries the DB on cache misses because dataset additions are rare.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#868
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:141-144
Timestamp: 2025-05-05T16:32:55.163Z
Learning: The column metadata table (created by `_create_column_metadata_table`) is only needed for dataset-specific workflows in `CLP_S` and is obsolete for non-dataset workflows.
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:90-96
Timestamp: 2024-11-15T16:22:33.635Z
Learning: In the CLP codebase, `table_prefix` is a trusted configuration value and can be safely embedded into SQL queries.
Learnt from: gibber9809
PR: y-scope/clp#672
File: components/core/src/clp_s/indexer/MySQLIndexStorage.cpp:30-38
Timestamp: 2025-01-23T17:08:55.566Z
Learning: SQL identifiers (table names, column names) in the codebase are currently from trusted sources and directly interpolated into queries. However, the team prefers to implement sanitization for robustness, even for trusted inputs.
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:320-324
Timestamp: 2025-05-05T01:12:18.561Z
Learning: In the CLP codebase, the `m_case_sensitive_search` flag is used only for actual string value comparisons during query evaluation, not for schema key name matching. Schema keys are always compared case-sensitively regardless of this flag's setting.
Learnt from: haiqi96
PR: y-scope/clp#620
File: components/core/src/clp/clo/clo.cpp:198-198
Timestamp: 2024-12-02T22:36:17.847Z
Learning: The existing references to `file_split_id` in the codebase are acceptable and do not need to be changed.
Learnt from: quinntaylormitchell
PR: y-scope/clp#961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.
Learnt from: junhaoliao
PR: y-scope/clp#939
File: components/package-template/src/etc/clp-config.yml:64-64
Timestamp: 2025-06-24T20:13:46.758Z
Learning: When users ask CodeRabbit to create an issue after providing suggestions, they want a GitHub issue created with the high-level requirements and context, not specific code implementations.
components/webui/server/src/plugins/DbManager.ts (2)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
🧬 Code Graph Analysis (2)
components/webui/client/src/pages/SearchPage/SearchControls/Dataset/sql.ts (2)
components/webui/client/src/settings.ts (1)
settings(34-34)components/webui/client/src/api/sql.ts (1)
querySql(14-14)
components/clp-package-utils/clp_package_utils/general.py (2)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (1)
fetch_existing_datasets(184-196)components/clp-py-utils/clp_py_utils/clp_config.py (1)
Database(80-159)
🪛 GitHub Actions: clp-lint
components/webui/server/src/routes/query.ts
[error] 61-61: ESLint: Expected newline between test and consequent of ternary expression (@stylistic/multiline-ternary)
[error] 61-61: ESLint: Expected newline between consequent and alternate of ternary expression (@stylistic/multiline-ternary)
components/webui/server/src/fastify-v2/routes/api/search/index.ts
[error] 73-73: ESLint: Expected newline between test and consequent of ternary expression (@stylistic/multiline-ternary)
[error] 73-73: ESLint: Expected newline between consequent and alternate of ternary expression (@stylistic/multiline-ternary)
🔇 Additional comments (15)
components/clp-package-utils/clp_package_utils/general.py (4)
5-5: LGTM: Import added for regex validation.The
remodule import is appropriately added to support the regex pattern matching in the newvalidate_dataset_namefunction.
29-33: LGTM: Required constants imported for dataset validation.The imports from
clp_metadata_db_utilsprovide the necessary constants for MySQL table name length validation and the function to fetch existing datasets.
570-570: LGTM: Function renamed for clarity.Renaming
validate_datasettovalidate_dataset_existsbetter describes the function's purpose of validating dataset existence in the metadata database.
588-614: Separator logic validated: no changes needed.The table-naming helper
_get_table_nameexpectsclp_table_prefixto include its own trailing underscore and appends exactly one underscore between the dataset name and the suffix. Subtracting1for that separator is correct—no adjustments required.components/webui/client/src/settings.ts (1)
8-8: LGTM: Dataset table name property added to settings.The addition of
SqlDbClpDatasetsTableNameproperty to the Settings type is correct and aligns with the dataset selection functionality being introduced throughout the codebase.components/webui/server/src/routes/query.ts (2)
41-41: LGTM: Dataset parameter added to request schema.The addition of the
datasetstring parameter to the request body schema correctly extends the API to support dataset selection.
48-48: LGTM: Dataset extracted from request body.The destructuring correctly extracts the
datasetparameter from the request body for use in the handler.components/webui/client/src/pages/SearchPage/SearchControls/Dataset/sql.ts (6)
1-2: LGTM: Appropriate imports for SQL queries and settings.The imports correctly bring in the SQL query function and settings configuration needed for dataset queries.
5-10: LGTM: Well-defined column names enum.The enum provides a clean way to reference dataset table column names and improves maintainability.
12-20: LGTM: Proper SQL query construction using settings.The SQL query correctly uses the configurable table name from settings and implements proper ordering. This addresses previous feedback about using settings.json for table names.
22-24: LGTM: Appropriate interface definition for query results.The
DatasetIteminterface correctly models the expected shape of the query result rows.
26-34: LGTM: Clean implementation of dataset fetching function.The function correctly executes the SQL query and extracts dataset names from the response. The error handling is delegated to the underlying
querySqlfunction, which is appropriate.
36-36: LGTM: Appropriate export of public API.Only the
fetchDatasetNamesfunction is exported, maintaining a clean public interface.components/webui/server/src/fastify-v2/routes/api/search/index.ts (1)
63-63: LGTM: Dataset parameter extracted from request body.The destructuring correctly adds the
datasetparameter to the extracted request body properties for use in the search query.components/webui/server/src/plugins/DbManager.ts (1)
101-135: LGTM! Clean implementation of explicit dataset parameter handling.The addition of the
datasetparameter is well-implemented with proper typing, documentation, and usage. The parameter is correctly used only forEXTRACT_JSONjobs, which aligns with the different requirements of the job types. The explicit parameter approach replaces the previous hardcoded defaults, providing better control and flexibility for dataset selection.
| dataset: ("" === dataset ? | ||
| null : | ||
| dataset), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really a hack. On the client, if we change cachedDataset to default to null, when dataset: null is sent from the client, it gets changed into dataset: "" on the server. I tried a few different things but couldn't find a solution that worked. If we can't find a solution soon, I suggest we open an issue to investigate this properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the new commit. Sorry, but can you run the package again to make sure the change is correct? @kirkrodrigues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we cannot submit "" as a dataset?
Like this new condition added in this commit ("" === dataset ?null) should never be reached ?
hoophalab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code works, but there are other style changes required. Let's make a release first.
| * @return | ||
| */ | ||
| const Dataset = () => { | ||
| const {refreshInterval} = useIngestStatsStore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't use a store from another page. Let's merge the code for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai create an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
@hoophalab I've created a GitHub issue to track the architectural concern about the Dataset component using a store from another page. The issue has been assigned to you and includes the relevant context and references to this PR and comment thread.
🏁 Script executed:
#!/bin/bash
# Create GitHub issue for architectural concern about cross-page store usage
gh issue create \
--title "Refactor Dataset component to avoid cross-page store dependency" \
--body "## Description
The Dataset component in the SearchPage is currently using \`useIngestStatsStore\` from the IngestPage, which creates inappropriate coupling between different pages/components.
## Context
This architectural concern was identified during code review where the Dataset component imports and uses a store from another page, violating good separation of concerns.
## Expected Outcome
The Dataset component should be refactored to either:
- Use a shared/global store for common data like refresh intervals
- Have its own local state management
- Receive necessary data through props
## References
- Original PR: https://github.com/y-scope/clp/pull/1050
- Comment thread: https://github.com/y-scope/clp/pull/1050#discussion_r2189045908
- Reported by: @hoophalab" \
--assignee hoophalab
Length of output: 940
@hoophalab Perfect! I've successfully created GitHub issue #1073 to track the architectural concern about the Dataset component using a cross-page store dependency.
Issue created: #1073
The issue has been assigned to you and includes all the relevant context, expected outcomes, and references to this PR and comment thread.
🐰✨
Thoughtfully documenting architectural improvements for future work
🧬 Code Graph Analysis Results
Relevant snippet from components/webui/client/src/pages/SearchPage/SearchControls/Dataset/sql.ts (line 36):
fetchDatasetNamesThis is the function imported and used as the queryFn in the useQuery hook within index.tsx to fetch the list of dataset names. Understanding its implementation is important to know how datasets are retrieved.
No relevant content from components/core/src/clp_s/ArchiveWriter.hpp (line 146):
The open(ArchiveWriterOption const& option) method in ArchiveWriter.hpp is unrelated to the React component or dataset fetching logic in the provided file. It appears to be part of a C++ backend or core library and does not impact the frontend dataset selector component.
Summary: The only relevant snippet is the fetchDatasetNames function from sql.ts since it directly relates to the dataset fetching logic used in the Dataset React component. The other snippet is unrelated.
🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp#962
File: components/log-viewer-webui/client/src/pages/SearchPage/SearchQueryStatus/Results.tsx:22-26
Timestamp: 2025-06-09T17:15:30.309Z
Learning: When using Zustand stores in React components, always use specific selectors (e.g., `useStore(state => state.property)`) instead of destructuring the entire state object to prevent unnecessary re-renders. This is especially important as state stores grow in complexity.
Learnt from: junhaoliao
PR: y-scope/clp#962
File: components/log-viewer-webui/client/src/pages/SearchPage/SearchQueryStatus/Results.tsx:22-26
Timestamp: 2025-06-09T17:15:30.309Z
Learning: When using Zustand stores in React components, always use specific selectors (e.g., `useStore(state => state.property)`) instead of destructuring the entire state object to prevent unnecessary re-renders. This is especially important as state stores grow in complexity.
Learnt from: junhaoliao
PR: y-scope/clp#937
File: components/log-viewer-webui/client/src/AntdApp.tsx:16-24
Timestamp: 2025-05-29T20:33:40.653Z
Learning: In components/log-viewer-webui React codebase: Return type annotations (like `: JSX.Element`) are unnecessary and not preferred for React components in JSX/TSX files.
Learnt from: junhaoliao
PR: y-scope/clp#939
File: components/package-template/src/etc/clp-config.yml:64-64
Timestamp: 2025-06-24T20:13:46.758Z
Learning: When users ask CodeRabbit to create an issue after providing suggestions, they want a GitHub issue created with the high-level requirements and context, not specific code implementations.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.761Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.761Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
| dataset: ("" === dataset ? | ||
| null : | ||
| dataset), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the new commit. Sorry, but can you run the package again to make sure the change is correct? @kirkrodrigues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont fully understand all these changes, if they are required let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Chaoyue already got rid of them in favour of the correct way of doing things.
|
One issue that manifests in webui but is caused by the package.
|
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@hoophalab, I'll create a GitHub issue to track this bug. 🏁 Script executed: Length of output: 1174 Perfect! I've successfully created GitHub issue #1075 to track this bug: #1075 The issue includes:
This will help ensure the bug gets proper attention and tracking for resolution. 🧠 Learnings used |
| fastify.post("/query/extract-stream", { | ||
| schema: { | ||
| body: Type.Object({ | ||
| // Type.Null must come before Type.String; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isnt explicity required since the cached dataset cannot be null. But I guess it isnt hurtful to include it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the dataset needs to be null for clp-text flavor, as kirk has already fixed in his commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can change cachedDataset to default to null now, after your fix, @hoophalab.
|
Another user-friendliness issue (not caused by the web UI itself): When submitting a query through the web UI, it actually sends two separate requests to the CLP package — one for the timeline and another for the search results. @coderabbitai can you create an issue on this? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@hoophalab, I'll create a GitHub issue to track this user-friendliness concern about dual search requests. Perfect! I've created GitHub issue #1076 to track this user-friendliness concern: #1076 The issue documents:
This will help ensure the team can investigate optimizations to improve the user experience around search job management. 🏁 Script executed: Length of output: 1777 Perfect! I've successfully created GitHub issue #1077 to track this user-friendliness concern: #1077 The issue comprehensively documents:
This will help the team investigate and implement improvements to provide a better user experience around search job management and system load perception. 🧠 Learnings used |
junhaoliao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving to provide write access since @hoophalab has reviewed the code
| return -1 | ||
|
|
||
| storage_engine: StorageEngine = clp_config.package.storage_engine | ||
| dataset = parsed_args.dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the parser used is ir_extraction_parser instead of json_extraction_parser, below error is raised:
junhao@ASUS-X870E:~/workspace/clp/build/clp-package/sbin$ ./decompress.sh i --orig-file-id=4577ad28-f615-4d37-8e7f-cd35dbb2324c 1
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/junhao/workspace/clp/build/clp-package/lib/python3/site-packages/clp_package_utils/scripts/decompress.py", line 321, in <module>
sys.exit(main(sys.argv))
File "/home/junhao/workspace/clp/build/clp-package/lib/python3/site-packages/clp_package_utils/scripts/decompress.py", line 314, in main
return handle_extract_stream_cmd(parsed_args, clp_home, default_config_file_path)
File "/home/junhao/workspace/clp/build/clp-package/lib/python3/site-packages/clp_package_utils/scripts/decompress.py", line 188, in handle_extract_stream_cmd
dataset = parsed_args.dataset
AttributeError: 'Namespace' object has no attribute 'dataset'
@coderabbitai create an issue to track this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
I can see the issue - the --dataset argument was only added to the json_extraction_parser but not to the ir_extraction_parser, yet the handle_extract_stream_cmd function tries to access parsed_args.dataset regardless of which parser was used.
Let me create an issue to track this bug:
@junhaoliao I've created an issue to track this bug. The problem is that the --dataset argument was only added to the json_extraction_parser but not to the ir_extraction_parser, yet the code tries to access parsed_args.dataset in both cases. This needs to be fixed by either adding the --dataset argument to the IR extraction parser or handling the missing attribute gracefully.
🏁 Script executed:
#!/bin/bash
gh issue create \
--title "IR extraction parser missing --dataset argument causes AttributeError" \
--body "## Description
When using the IR extraction parser in the decompress script, an AttributeError is raised because the \`--dataset\` argument was not added to the \`ir_extraction_parser\`, only to the \`json_extraction_parser\`.
## Error
\`\`\`
AttributeError: 'Namespace' object has no attribute 'dataset'
\`\`\`
## Steps to Reproduce
1. Run the decompress script with IR extraction: \`./decompress.sh i --orig-file-id=4577ad28-f615-4d37-8e7f-cd35dbb2324c 1\`
2. The script fails at line 188 in \`handle_extract_stream_cmd\` when trying to access \`parsed_args.dataset\`
## Root Cause
In PR #1050, the \`--dataset\` argument was only added to the \`json_extraction_parser\` but not to the \`ir_extraction_parser\`. However, the \`handle_extract_stream_cmd\` function tries to access \`parsed_args.dataset\` regardless of which parser was used.
## Expected Behavior
The \`ir_extraction_parser\` should also have the \`--dataset\` argument added, or the code should handle cases where the dataset argument is not available.
## Related
- PR: https://github.com/y-scope/clp/pull/1050
- Comment: https://github.com/y-scope/clp/pull/1050#discussion_r2275420916
- Reported by: @junhaoliao" \
--assignee junhaoliao
Length of output: 1343
Perfect! I've successfully created issue #1200 to track this bug. The issue documents:
- Root cause: The
--datasetargument was only added to thejson_extraction_parserbut not to their_extraction_parser - Error details:
AttributeError: 'Namespace' object has no attribute 'dataset'when using IR extraction - Steps to reproduce: Using the decompress script with IR extraction command
- Expected fix: Either add the
--datasetargument to the IR extraction parser or handle the missing attribute gracefully
The issue has been assigned to you as requested and includes backlinks to this PR and comment for proper tracking.
🧬 Code Graph Analysis Results
- File: components/clp-package-utils/clp_package_utils/scripts/decompress.py, lines 257-317
def main(argv):
clp_home = get_clp_home()
default_config_file_path = clp_home / CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH
args_parser = argparse.ArgumentParser(description="Decompresses logs")
args_parser.add_argument(
"--config",
"-c",
default=str(default_config_file_path),
help="CLP configuration file.",
)
command_args_parser = args_parser.add_subparsers(dest="command", required=True)
# File extraction command parser
file_extraction_parser = command_args_parser.add_parser(EXTRACT_FILE_CMD)
file_extraction_parser.add_argument(
"paths", metavar="PATH", nargs="*", help="Files to extract."
)
file_extraction_parser.add_argument(
"-f", "--files-from", help="A file listing all files to extract."
)
file_extraction_parser.add_argument(
"-d", "--extraction-dir", metavar="DIR", default=".", help="Extract files into DIR."
)
# IR extraction command parser
ir_extraction_parser = command_args_parser.add_parser(EXTRACT_IR_CMD)
ir_extraction_parser.add_argument("msg_ix", type=int, help="Message index.")
ir_extraction_parser.add_argument(
"--target-uncompressed-size", type=int, help="Target uncompressed IR size."
)
group = ir_extraction_parser.add_mutually_exclusive_group(required=True)
group.add_argument("--orig-file-id", type=str, help="Original file's ID.")
group.add_argument("--orig-file-path", type=str, help="Original file's path.")
# JSON extraction command parser
json_extraction_parser = command_args_parser.add_parser(EXTRACT_JSON_CMD)
json_extraction_parser.add_argument("archive_id", type=str, help="Archive ID")
json_extraction_parser.add_argument(
"--dataset",
type=str,
default=None,
help="The dataset that the archives belong to.",
)
json_extraction_parser.add_argument(
"--target-chunk-size",
type=int,
help="Target chunk size (B).",
)
parsed_args = args_parser.parse_args(argv[1:])
command = parsed_args.command
if EXTRACT_FILE_CMD == command:
return handle_extract_file_cmd(parsed_args, clp_home, default_config_file_path)
elif command in (EXTRACT_IR_CMD, EXTRACT_JSON_CMD):
return handle_extract_stream_cmd(parsed_args, clp_home, default_config_file_path)
else:
logger.exception(f"Unexpected command: {command}")
return -1
- File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py, lines 273-332
def main(argv):
clp_home = get_clp_home()
default_config_file_path = clp_home / CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH
args_parser = argparse.ArgumentParser(description="Decompresses logs.")
args_parser.add_argument(
"--config",
"-c",
default=str(default_config_file_path),
help="CLP configuration file.",
)
command_args_parser = args_parser.add_subparsers(dest="command", required=True)
# File extraction command parser
file_extraction_parser = command_args_parser.add_parser(EXTRACT_FILE_CMD)
file_extraction_parser.add_argument(
"paths", metavar="PATH", nargs="*", help="Files to extract."
)
file_extraction_parser.add_argument(
"-f", "--files-from", help="A file listing all files to extract."
)
file_extraction_parser.add_argument(
"-d", "--extraction-dir", metavar="DIR", default=".", help="Extract files into DIR."
)
# IR extraction command parser
ir_extraction_parser = command_args_parser.add_parser(EXTRACT_IR_CMD)
ir_extraction_parser.add_argument("msg_ix", type=int, help="Message index.")
ir_extraction_parser.add_argument(
"--target-uncompressed-size", type=int, help="Target uncompressed IR size."
)
group = ir_extraction_parser.add_mutually_exclusive_group(required=True)
group.add_argument("--orig-file-id", type=str, help="Original file's ID.")
group.add_argument("--orig-file-path", type=str, help="Original file's path.")
# JSON extraction command parser
json_extraction_parser = command_args_parser.add_parser(EXTRACT_JSON_CMD)
json_extraction_parser.add_argument("archive_id", type=str, help="Archive ID")
json_extraction_parser.add_argument(
"--dataset",
type=str,
default=None,
help="The dataset that the archives belong to.",
)
json_extraction_parser.add_argument(
"--target-chunk-size", type=int, help="Target chunk size (B)."
)
parsed_args = args_parser.parse_args(argv[1:])
command = parsed_args.command
if EXTRACT_FILE_CMD == command:
return handle_extract_file_cmd(parsed_args, clp_home, default_config_file_path)
elif command in (EXTRACT_IR_CMD, EXTRACT_JSON_CMD):
return handle_extract_stream_cmd(parsed_args, clp_home, default_config_file_path)
else:
logger.exception(f"Unexpected command: {command}")
return -1
- File: components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py, lines 85-241
def main(argv: typing.List[str]) -> int:
clp_home: Path = get_clp_home()
default_config_file_path: Path = clp_home / CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH
# Top-level parser and options
args_parser: argparse.ArgumentParser = argparse.ArgumentParser(
description="View list of archive IDs or delete compressed archives."
)
args_parser.add_argument(
"--config",
"-c",
default=str(default_config_file_path),
help="CLP configuration file.",
)
args_parser.add_argument(
"--dataset",
type=str,
default=None,
help="The dataset that the archives belong to.",
)
# Top-level commands
subparsers: argparse._SubParsersAction[argparse.ArgumentParser] = args_parser.add_subparsers(
dest="subcommand",
required=True,
)
find_parser: argparse.ArgumentParser = subparsers.add_parser(
FIND_COMMAND,
help="List IDs of archives.",
)
del_parser: argparse.ArgumentParser = subparsers.add_parser(
DEL_COMMAND,
help="Delete archives from the database and file system.",
)
# Options for find subcommand
find_parser.add_argument(
BEGIN_TS_ARG,
dest="begin_ts",
type=int,
help="Time-range lower-bound (inclusive) as milliseconds from the UNIX epoch.",
)
find_parser.add_argument(
END_TS_ARG,
dest="end_ts",
type=int,
help="Time-range upper-bound (inclusive) as milliseconds from the UNIX epoch.",
)
# Options for delete subcommand
del_parser.add_argument(
DRY_RUN_ARG,
dest="dry_run",
action="store_true",
help="Preview delete without making changes. Lists errors and files to be deleted.",
)
# Subcommands for delete subcommands
del_subparsers: argparse._SubParsersAction[argparse.ArgumentParser] = del_parser.add_subparsers(
dest="del_subcommand",
required=True,
)
# Delete by ID subcomand
del_id_parser: argparse.ArgumentParser = del_subparsers.add_parser(
DEL_BY_IDS_SUBCOMMAND,
help="Delete archives by ID.",
)
# Delete by ID arguments
del_id_parser.add_argument(
"ids",
nargs="+",
help="List of archive IDs to delete",
)
# Delete by filter subcommand
del_filter_parser: argparse.ArgumentParser = del_subparsers.add_parser(
DEL_BY_FILTER_SUBCOMMAND,
help="Deletes archives that fall within the specified time range.",
)
# Delete by filter arguments
del_filter_parser.add_argument(
"begin_ts",
type=int,
help="Time-range lower-bound (inclusive) as milliseconds from the UNIX epoch.",
)
del_filter_parser.add_argument(
"end_ts",
type=int,
help="Time-range upper-bound (inclusive) as milliseconds from the UNIX epoch.",
)
parsed_args: argparse.Namespace = args_parser.parse_args(argv[1:])
# Validate and load config file
config_file_path: Path = Path(parsed_args.config)
try:
clp_config: CLPConfig = load_config_file(
config_file_path, default_config_file_path, clp_home
)
clp_config.validate_logs_dir()
except:
logger.exception("Failed to load config.")
return -1
database_config: Database = clp_config.database
dataset = parsed_args.dataset
if dataset is not None:
try:
validate_dataset_exists(database_config, dataset)
except Exception as e:
logger.error(e)
return -1
archives_dir: Path = clp_config.archive_output.get_directory()
if not archives_dir.exists():
logger.error("`archive_output.directory` doesn't exist.")
return -1
if FIND_COMMAND == parsed_args.subcommand:
return _find_archives(
archives_dir,
database_config,
dataset,
parsed_args.begin_ts,
parsed_args.end_ts,
)
elif DEL_COMMAND == parsed_args.subcommand:
delete_handler: DeleteHandler
if DEL_BY_IDS_SUBCOMMAND == parsed_args.del_subcommand:
delete_handler: IdDeleteHandler = IdDeleteHandler(parsed_args.ids)
return _delete_archives(
archives_dir,
database_config,
dataset,
delete_handler,
parsed_args.dry_run,
)
elif DEL_BY_FILTER_SUBCOMMAND == parsed_args.del_subcommand:
delete_handler: FilterDeleteHandler = FilterDeleteHandler(
[parsed_args.begin_ts, parsed_args.end_ts]
)
return _delete_archives(
archives_dir,
database_config,
dataset,
delete_handler,
parsed_args.dry_run,
)
else:
logger.error(f"Unsupported subcommand: `{parsed_args.del_subcommand}`.")
return -1
else:
logger.error(f"Unsupported subcommand: `{parsed_args.subcommand}`.")
return -1
✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp#1050
File: components/clp-package-utils/clp_package_utils/scripts/archive_manager.py:178-178
Timestamp: 2025-08-14T05:06:58.977Z
Learning: In the decompress.py script, the --dataset argument was only added to the json_extraction_parser but not to the ir_extraction_parser, causing AttributeError when handle_extract_stream_cmd tries to access parsed_args.dataset for IR extraction commands.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Description
This PR:
main.clp-sto compression task workers. #819datasetfields to input configs of compression and search jobs. #839CLP_Sstorage engine. #831table-nametodataset-namefor consistency; Update the name of the column metadata table name to put the dataset name first. #846table_prefixfrom connection params and table suffix constants. #864and covers the final part of the dataset implementation plan and exposes the full dataset feature to CLP users both via the package CLI and the new WebUI.
clp-package:
clp-package-utils
Add dataset arguments
--datasetto various package scripts‘ command-line interface. If no dataset argument is provided, the default dataset name isdefault.webui
Add dataset selector to new webui. Dataset selector has to query backend to get datasets, and then display them. I also now pass the dataset to the log viewer connector.
The

cachedDatasetis the dataset when query is submitted. We have to do this since the user could change the selector, then click a link, which would result in an error. Instead we cache the dataset when they submitted the query.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Chores