-
Couldn't load subscription status.
- Fork 10
Refactor Upload Endpoint, Add Validation, and Ensure PVC/MariaDB Parity #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor Upload Endpoint, Add Validation, and Ensure PVC/MariaDB Parity #47
Conversation
…erface, dtype/shape validation
Reviewer's GuideThis PR refactors KServe data models and storage interfaces to enforce async contracts and strict dtype/shape validation, unifies ModelMesh and KServe payload reconciliation in the consumer endpoints, implements a comprehensive data upload endpoint with tag validation and usage metrics, ensures feature parity between PVC and MariaDB backends, and extends the test suite with integration, async‐contract, and end-to-end upload tests. Class diagram for new and refactored KServe data modelsclassDiagram
class KServeDataType {
<<enum>>
BOOL
INT8
INT16
INT32
INT64
UINT8
UINT16
UINT32
UINT64
FP16
FP32
FP64
BYTES
}
class KServeData {
+name: str
+shape: List[int]
+datatype: KServeDataType
+parameters: Optional[Dict[str, str]]
+data: List[Any]
+_validate_shape()
+validate_data_matches_type()
}
class KServeInferenceRequest {
+id: Optional[str]
+parameters: Optional[Dict[str, str]]
+inputs: List[KServeData]
+outputs: Optional[List[KServeData]]
}
class KServeInferenceResponse {
+model_name: str
+model_version: Optional[str]
+id: Optional[str]
+parameters: Optional[Dict[str, str]]
+outputs: List[KServeData]
}
KServeInferenceRequest --> KServeData
KServeInferenceResponse --> KServeData
KServeData --> KServeDataType
Class diagram for refactored StorageInterface and implementationsclassDiagram
class StorageInterface {
<<abstract>>
+async dataset_exists(dataset_name: str)
+async list_all_datasets()
+async dataset_rows(dataset_name: str)
+async dataset_shape(dataset_name: str)
+async write_data(dataset_name: str, new_rows, column_names: List[str])
+async read_data(dataset_name: str, start_row: int, n_rows: int)
+async get_original_column_names(dataset_name: str)
+async get_aliased_column_names(dataset_name: str)
+async apply_name_mapping(dataset_name: str, name_mapping: Dict[str, str])
+async delete_dataset(dataset_name: str)
+async persist_partial_payload(payload, payload_id, is_input)
+async get_partial_payload(payload_id, is_input, is_modelmesh)
+async delete_partial_payload(payload_id, is_input)
}
class PVCStorage {
+async dataset_exists(...)
+async list_all_datasets(...)
+async dataset_rows(...)
+async dataset_shape(...)
+async write_data(...)
+async read_data(...)
+async get_original_column_names(...)
+async get_aliased_column_names(...)
+async apply_name_mapping(...)
+async delete_dataset(...)
+async persist_partial_payload(...)
+async get_partial_payload(...)
+async delete_partial_payload(...)
}
class MariaDBStorage {
+async dataset_exists(...)
+async list_all_datasets(...)
+async dataset_rows(...)
+async dataset_shape(...)
+async write_data(...)
+async read_data(...)
+async get_original_column_names(...)
+async get_aliased_column_names(...)
+async apply_name_mapping(...)
+async delete_dataset(...)
+async persist_partial_payload(...)
+async get_partial_payload(...)
+async delete_partial_payload(...)
}
StorageInterface <|-- PVCStorage
StorageInterface <|-- MariaDBStorage
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
Blocking issues:
- Detected possible formatted SQL query. Use parameterized queries instead. (link)
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
- Detected possible formatted SQL query. Use parameterized queries instead. (link)
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
- Detected possible formatted SQL query. Use parameterized queries instead. (link)
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
- Detected possible formatted SQL query. Use parameterized queries instead. (link)
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
- Detected possible formatted SQL query. Use parameterized queries instead. (link)
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
- Detected possible formatted SQL query. Use parameterized queries instead. (link)
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
General comments:
- Avoid brittle error handling in the upload endpoint—rather than checking for “Could not reconcile_kserve” in the exception message, raise and catch a dedicated ReconciliationError (or similar) to cleanly separate reconcilation failures from other HTTPExceptions.
- The module-level global storage interface (
get_global_storage_interface) can leak state between tests and requests; consider using FastAPI’s dependency injection (e.g.Depends) to provide a fresh storage instance per request instead of a mutable global. - In pvc.py’s
_write_raw_data, casting all void arrays to a fixed V1024 dtype may silently truncate larger bytes payloads; consider computing the necessary void type size per payload or failing fast with a clear error when the data exceeds the max length.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Avoid brittle error handling in the upload endpoint—rather than checking for “Could not reconcile_kserve” in the exception message, raise and catch a dedicated ReconciliationError (or similar) to cleanly separate reconcilation failures from other HTTPExceptions.
- The module-level global storage interface (`get_global_storage_interface`) can leak state between tests and requests; consider using FastAPI’s dependency injection (e.g. `Depends`) to provide a fresh storage instance per request instead of a mutable global.
- In pvc.py’s `_write_raw_data`, casting all void arrays to a fixed V1024 dtype may silently truncate larger bytes payloads; consider computing the necessary void type size per payload or failing fast with a clear error when the data exceeds the max length.
## Individual Comments
### Comment 1
<location> `src/service/data/model_data.py:121` </location>
<code_context>
return input_data, output_data, metadata
+ async def get_metadata_as_df(self):
+ _, _, metadata = await self.data(get_input=False, get_output=False)
+ metadata_cols = (await self.column_names())[2]
+ return pd.DataFrame(metadata, columns=metadata_cols)
+
+
</code_context>
<issue_to_address>
get_metadata_as_df assumes metadata and columns are always present and aligned.
Add checks to handle cases where metadata or columns are missing to prevent runtime errors.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
async def get_metadata_as_df(self):
_, _, metadata = await self.data(get_input=False, get_output=False)
metadata_cols = (await self.column_names())[2]
return pd.DataFrame(metadata, columns=metadata_cols)
=======
async def get_metadata_as_df(self):
_, _, metadata = await self.data(get_input=False, get_output=False)
metadata_cols = (await self.column_names())[2]
if not metadata or not metadata_cols:
logger.warning("Metadata or metadata columns missing; returning empty DataFrame.")
return pd.DataFrame()
if len(metadata) > 0 and len(metadata_cols) > 0 and all(isinstance(row, (list, tuple, np.ndarray)) for row in metadata):
# Check if columns and data are aligned
if all(len(row) == len(metadata_cols) for row in metadata):
return pd.DataFrame(metadata, columns=metadata_cols)
else:
logger.warning("Metadata rows and columns are not aligned; returning empty DataFrame.")
return pd.DataFrame()
else:
logger.warning("Metadata format is invalid; returning empty DataFrame.")
return pd.DataFrame()
>>>>>>> REPLACE
</suggested_fix>
### Comment 2
<location> `tests/service/data/test_payload_reconciliation_pvc.py:86` </location>
<code_context>
+ async def _test_full_reconciliation(self):
</code_context>
<issue_to_address>
Full reconciliation test covers input/output persistence and cleanup.
Consider adding a test case for corrupted or invalid payloads to verify error handling.
</issue_to_address>
## Security Issues
### Issue 1
<location> `src/service/data/storage/maria/maria.py:67` </location>
<issue_to_address>
**security (python.lang.security.audit.formatted-sql-query):** Detected possible formatted SQL query. Use parameterized queries instead.
*Source: opengrep*
</issue_to_address>
### Issue 2
<location> `src/service/data/storage/maria/maria.py:67` </location>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>
### Issue 3
<location> `src/service/data/storage/maria/maria.py:123` </location>
<issue_to_address>
**security (python.lang.security.audit.formatted-sql-query):** Detected possible formatted SQL query. Use parameterized queries instead.
*Source: opengrep*
</issue_to_address>
### Issue 4
<location> `src/service/data/storage/maria/maria.py:123` </location>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>
### Issue 5
<location> `src/service/data/storage/maria/maria.py:151` </location>
<issue_to_address>
**security (python.lang.security.audit.formatted-sql-query):** Detected possible formatted SQL query. Use parameterized queries instead.
*Source: opengrep*
</issue_to_address>
### Issue 6
<location> `src/service/data/storage/maria/maria.py:151` </location>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>
### Issue 7
<location> `src/service/data/storage/maria/maria.py:282` </location>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>
### Issue 8
<location> `src/service/data/storage/maria/maria.py:349` </location>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>
### Issue 9
<location> `src/service/data/storage/maria/maria.py:357` </location>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>
### Issue 10
<location> `src/service/data/storage/maria/maria.py:372` </location>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>
### Issue 11
<location> `src/service/data/storage/maria/maria.py:382` </location>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>
### Issue 12
<location> `src/service/data/storage/maria/maria.py:383` </location>
<issue_to_address>
**security (python.lang.security.audit.formatted-sql-query):** Detected possible formatted SQL query. Use parameterized queries instead.
*Source: opengrep*
</issue_to_address>
### Issue 13
<location> `src/service/data/storage/maria/maria.py:383` </location>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>
### Issue 14
<location> `src/service/data/storage/maria/maria.py:395` </location>
<issue_to_address>
**security (python.lang.security.audit.formatted-sql-query):** Detected possible formatted SQL query. Use parameterized queries instead.
*Source: opengrep*
</issue_to_address>
### Issue 15
<location> `src/service/data/storage/maria/maria.py:395` </location>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>
### Issue 16
<location> `src/service/data/storage/maria/maria.py:396` </location>
<issue_to_address>
**security (python.lang.security.audit.formatted-sql-query):** Detected possible formatted SQL query. Use parameterized queries instead.
*Source: opengrep*
</issue_to_address>
### Issue 17
<location> `src/service/data/storage/maria/maria.py:396` </location>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| async def get_metadata_as_df(self): | ||
| _, _, metadata = await self.data(get_input=False, get_output=False) | ||
| metadata_cols = (await self.column_names())[2] | ||
| return pd.DataFrame(metadata, columns=metadata_cols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: get_metadata_as_df assumes metadata and columns are always present and aligned.
Add checks to handle cases where metadata or columns are missing to prevent runtime errors.
| async def get_metadata_as_df(self): | |
| _, _, metadata = await self.data(get_input=False, get_output=False) | |
| metadata_cols = (await self.column_names())[2] | |
| return pd.DataFrame(metadata, columns=metadata_cols) | |
| async def get_metadata_as_df(self): | |
| _, _, metadata = await self.data(get_input=False, get_output=False) | |
| metadata_cols = (await self.column_names())[2] | |
| if not metadata or not metadata_cols: | |
| logger.warning("Metadata or metadata columns missing; returning empty DataFrame.") | |
| return pd.DataFrame() | |
| if len(metadata) > 0 and len(metadata_cols) > 0 and all(isinstance(row, (list, tuple, np.ndarray)) for row in metadata): | |
| # Check if columns and data are aligned | |
| if all(len(row) == len(metadata_cols) for row in metadata): | |
| return pd.DataFrame(metadata, columns=metadata_cols) | |
| else: | |
| logger.warning("Metadata rows and columns are not aligned; returning empty DataFrame.") | |
| return pd.DataFrame() | |
| else: | |
| logger.warning("Metadata format is invalid; returning empty DataFrame.") | |
| return pd.DataFrame() |
| output_payload = await self.storage.get_partial_payload( | ||
| self.request_id, is_input=False, is_modelmesh=True | ||
| ) | ||
| self.assertIsNone(output_payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Full reconciliation test covers input/output persistence and cleanup.
Consider adding a test case for corrupted or invalid payloads to verify error handling.
|
|
||
| with self.connection_manager as (conn, cursor): | ||
| cursor.execute(f"CREATE TABLE IF NOT EXISTS `{self.dataset_reference_table}` (table_idx BIGINT AUTO_INCREMENT, dataset_name varchar(255), metadata JSON, n_rows BIGINT, PRIMARY KEY (table_idx))") | ||
| cursor.execute(f"CREATE TABLE IF NOT EXISTS `{self.partial_payload_table}` (payload_id varchar(255), is_input BOOLEAN, payload_data LONGBLOB)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (python.lang.security.audit.formatted-sql-query): Detected possible formatted SQL query. Use parameterized queries instead.
Source: opengrep
|
|
||
| with self.connection_manager as (conn, cursor): | ||
| cursor.execute(f"CREATE TABLE IF NOT EXISTS `{self.dataset_reference_table}` (table_idx BIGINT AUTO_INCREMENT, dataset_name varchar(255), metadata JSON, n_rows BIGINT, PRIMARY KEY (table_idx))") | ||
| cursor.execute(f"CREATE TABLE IF NOT EXISTS `{self.partial_payload_table}` (payload_id varchar(255), is_input BOOLEAN, payload_data LONGBLOB)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (python.sqlalchemy.security.sqlalchemy-execute-raw-query): Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
Source: opengrep
| """ | ||
| def _list_all_datasets_sync(self): | ||
| with self.connection_manager as (conn, cursor): | ||
| cursor.execute(f"SELECT dataset_name FROM `{self.dataset_reference_table}`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (python.lang.security.audit.formatted-sql-query): Detected possible formatted SQL query. Use parameterized queries instead.
Source: opengrep
| from src.service.data.modelmesh_parser import PartialPayload | ||
| from src.service.data.storage.maria.maria import MariaDBStorage | ||
| from src.service.data.storage.pvc import PVCStorage | ||
| from tests.service.data.test_payload_reconciliation_pvc import TestPayloadReconciliation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Don't import test modules. (dont-import-test-modules)
Explanation
Don't import test modules.Tests should be self-contained and don't depend on each other.
If a helper function is used by multiple tests,
define it in a helper module,
instead of importing one test from the other.
| try: | ||
| np.array(flat, dtype=np_dtype) | ||
| except (ValueError, TypeError) as e: | ||
| raise ValueError(f"Data cannot be cast to {self.datatype}: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Explicitly raise from a previous error (raise-from-previous-error)
| raise ValueError(f"Data cannot be cast to {self.datatype}: {e}") | |
| raise ValueError(f"Data cannot be cast to {self.datatype}: {e}") from e |
| async def upload_data(payload: ModelInferJointPayload): | ||
| """Upload a batch of model data to TrustyAI.""" | ||
| async def upload(payload: UploadPayload) -> Dict[str, str]: | ||
| """Upload model data""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Use named expression to simplify assignment and conditional (
use-named-expression) - Explicitly raise from a previous error (
raise-from-previous-error)
| else: | ||
| row = [i * 2 + j + output_offset for j in range(n_output_cols)] | ||
| output_data.append(row) | ||
| payload = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Inline variable that is immediately returned (inline-immediately-returned-variable)
| for row_idx in range(n_rows): | ||
| row = [row_idx * 2 + col_idx for col_idx in range(n_output_cols)] | ||
| output_data.append(row) | ||
| payload = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Inline variable that is immediately returned (inline-immediately-returned-variable)
Summary
This PR extends the work on data upload and storage integration. It addresses the following Jira tasks:
StorageInterfacefunctions are consistently asyncChanges
1. Data Model & Validation
KServeDataTypeenum andK_SERVE_NUMPY_DTYPESmapping to enforce consistent dtype handling.KServeDatawith strict shape validation and type value validataion (BOOL, unsigned ints, BYTES as JSON strings).2. Storage Interface Consistency
StorageInterfacemethods.4. Testing
scripts/test_upload_endpoint.sh) to validate upload endpoint across multiple edge cases.Summary by Sourcery
Unify and async-ify the storage layer across PVC and MariaDB, introduce strict KServe data validation, refactor consumer reconciliation logic, and add a new /data/upload endpoint with tag validation for consistent data ingestion.
New Features:
Bug Fixes:
Enhancements:
Build:
Tests: