Skip to content

feat: implement comprehensive dataset management functionality#29

Merged
Patrick-Ehimen merged 3 commits intomainfrom
feat/dataset-management
Oct 27, 2025
Merged

feat: implement comprehensive dataset management functionality#29
Patrick-Ehimen merged 3 commits intomainfrom
feat/dataset-management

Conversation

@Patrick-Ehimen
Copy link
Copy Markdown
Owner

@Patrick-Ehimen Patrick-Ehimen commented Oct 22, 2025

Description

This PR implements comprehensive dataset management functionality for the Lighthouse AI integration system. The feature enables AI agents to create, manage, and organize collections of files with metadata, versioning, and access control through the MCP protocol.

Key Features

SDK Wrapper Enhancements

  • createDataset: Upload multiple files and create dataset with metadata and progress tracking
  • updateDataset: Add/remove files and update dataset properties with version management
  • getDataset: Retrieve detailed dataset information by ID
  • listDatasets: List all datasets with pagination support
  • deleteDataset: Remove dataset and optionally associated files

New MCP Tools

  • lighthouse_create_dataset: Create datasets with comprehensive file validation
  • lighthouse_list_datasets: List datasets with pagination and filtering options
  • lighthouse_get_dataset: Retrieve detailed dataset information and file lists
  • lighthouse_update_dataset: Update existing datasets with new files and metadata

Service Layer Improvements

  • Extended ILighthouseService interface with dataset methods
  • Implemented dataset storage in both real and mock Lighthouse services
  • Added dataset caching and automatic version management
  • Integrated with existing file upload/download infrastructure

The implementation maintains backward compatibility while adding powerful new dataset management capabilities for AI agents.

Summary by Sourcery

Implement comprehensive dataset management support across the SDK, service layer, and MCP tools, enabling AI agents to manage collections of files with rich metadata, pagination, versioning, and access control.

New Features:

  • Introduce SDK methods to create, update, retrieve, list, and delete datasets with metadata, versioning, encryption, and progress tracking.
  • Implement corresponding dataset management APIs in ILighthouseService and its real and mock implementations with caching and automatic version handling.
  • Add new MCP tools (lighthouse_create_dataset, lighthouse_list_datasets, lighthouse_get_dataset, lighthouse_update_dataset) and register them in the MCP server for dataset operations.

Enhancements:

  • Extend DTOs and types with DatasetOptions, DatasetInfo, and ListDatasetsResponse for structured dataset handling
  • Integrate dataset workflows with existing file upload/download infrastructure while preserving backward compatibility

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Oct 22, 2025

Reviewer's Guide

This PR introduces end-to-end dataset management support by extending the SDK, service layer, and MCP server tooling. The SDK wrapper gains create/update/get/list/delete dataset methods with progress tracking and versioning; the real and mock LighthouseService implementations are updated with dataset storage, caching, and conversion between SDK and service models; new MCP tools are registered and implemented for dataset operations; and corresponding types and test scaffolding are added.

Sequence diagram for dataset creation via MCP tool and service

sequenceDiagram
actor User
participant MCPServer
participant LighthouseCreateDatasetTool
participant ILighthouseService
participant LighthouseAISDK
User->>MCPServer: Request to create dataset
MCPServer->>LighthouseCreateDatasetTool: execute(args)
LighthouseCreateDatasetTool->>ILighthouseService: createDataset(params)
ILighthouseService->>LighthouseAISDK: createDataset(filePaths, options)
LighthouseAISDK-->>ILighthouseService: DatasetInfo
ILighthouseService-->>LighthouseCreateDatasetTool: Dataset
LighthouseCreateDatasetTool-->>MCPServer: ToolResult
MCPServer-->>User: Dataset creation result
Loading

Sequence diagram for updating a dataset via MCP tool and service

sequenceDiagram
actor User
participant MCPServer
participant LighthouseUpdateDatasetTool
participant ILighthouseService
participant LighthouseAISDK
User->>MCPServer: Request to update dataset
MCPServer->>LighthouseUpdateDatasetTool: execute(args)
LighthouseUpdateDatasetTool->>ILighthouseService: updateDataset(params)
ILighthouseService->>LighthouseAISDK: updateDataset(datasetId, options)
LighthouseAISDK-->>ILighthouseService: DatasetInfo
ILighthouseService-->>LighthouseUpdateDatasetTool: Dataset
LighthouseUpdateDatasetTool-->>MCPServer: ToolResult
MCPServer-->>User: Dataset update result
Loading

Class diagram for new and updated dataset types

classDiagram
class DatasetOptions {
  +string name
  +string description
  +boolean encrypt
  +Record<string, any> metadata
  +string[] tags
  +onProgress(progress: ProgressInfo)
}
class DatasetInfo {
  +string id
  +string name
  +string description
  +string[] files
  +string version
  +Date createdAt
  +Date updatedAt
  +boolean encrypted
  +Record<string, any> metadata
  +string[] tags
  +number totalSize
  +number fileCount
}
class ListDatasetsResponse {
  +DatasetInfo[] datasets
  +number total
  +boolean hasMore
  +string cursor
}
DatasetOptions --> ProgressInfo
ListDatasetsResponse --> DatasetInfo
Loading

Class diagram for ILighthouseService interface (updated)

classDiagram
class ILighthouseService {
  +createDataset(params): Promise<Dataset>
  +updateDataset(params): Promise<Dataset>
  +getDataset(datasetId): Promise<Dataset | undefined>
  +listDatasets(params): Promise<DatasetsListResult>
  +deleteDataset(datasetId, deleteFiles): Promise<void>
  +clear(): void
}
class Dataset {
  +string id
  +string name
  +string description
  +UploadResult[] files
  +Record<string, any> metadata
  +string version
  +Date createdAt
  +Date updatedAt
  +boolean encrypted
  +AccessCondition[] accessConditions
}
class DatasetsListResult {
  +Dataset[] datasets
  +number total
  +boolean hasMore
}
ILighthouseService --> Dataset
ILighthouseService --> DatasetsListResult
DatasetsListResult --> Dataset
Loading

Class diagram for LighthouseAISDK dataset management methods

classDiagram
class LighthouseAISDK {
  +createDataset(filePaths, options): Promise<DatasetInfo>
  +updateDataset(datasetId, options): Promise<DatasetInfo>
  +getDataset(datasetId): Promise<DatasetInfo>
  +listDatasets(limit, offset): Promise<ListDatasetsResponse>
  +deleteDataset(datasetId, deleteFiles): Promise<void>
}
LighthouseAISDK --> DatasetInfo
LighthouseAISDK --> ListDatasetsResponse
Loading

Class diagram for new MCP dataset tools

classDiagram
class LighthouseCreateDatasetTool {
  +execute(args): Promise<ProgressAwareToolResult>
  +static getDefinition(): MCPToolDefinition
}
class LighthouseListDatasetsTool {
  +execute(args): Promise<ProgressAwareToolResult>
  +static getDefinition(): MCPToolDefinition
}
class LighthouseGetDatasetTool {
  +execute(args): Promise<ProgressAwareToolResult>
  +static getDefinition(): MCPToolDefinition
}
class LighthouseUpdateDatasetTool {
  +execute(args): Promise<ProgressAwareToolResult>
  +static getDefinition(): MCPToolDefinition
}
LighthouseCreateDatasetTool --> MCPToolDefinition
LighthouseListDatasetsTool --> MCPToolDefinition
LighthouseGetDatasetTool --> MCPToolDefinition
LighthouseUpdateDatasetTool --> MCPToolDefinition
Loading

File-Level Changes

Change Details Files
Extend SDK wrapper with dataset lifecycle methods
  • Added createDataset with file uploads, metadata, progress tracking and error handling
  • Implemented updateDataset, getDataset, listDatasets and deleteDataset methods
  • Introduced DatasetOptions, DatasetInfo and ListDatasetsResponse types
packages/sdk-wrapper/src/LighthouseAISDK.ts
packages/sdk-wrapper/src/types.ts
packages/sdk-wrapper/src/index.ts
Augment service layer for dataset support
  • Extended ILighthouseService interface with dataset methods
  • Implemented create/update/get/list/delete in LighthouseService with caching
  • Added corresponding datasetStore logic to MockLighthouseService
apps/mcp-server/src/services/ILighthouseService.ts
apps/mcp-server/src/services/LighthouseService.ts
apps/mcp-server/src/services/MockLighthouseService.ts
Register new dataset tools in MCP server
  • Imported and registered create/list/get/update dataset tools in server.ts
  • Updated tools index to export dataset tool classes
  • Updated getAllToolDefinitions to include new definitions
apps/mcp-server/src/server.ts
apps/mcp-server/src/tools/index.ts
Implement MCP tools for dataset operations
  • Created LighthouseCreateDatasetTool with parameter validation and execution
  • Created LighthouseUpdateDatasetTool with update logic and response formatting
  • Added LighthouseListDatasetsTool and LighthouseGetDatasetTool classes
apps/mcp-server/src/tools/LighthouseCreateDatasetTool.ts
apps/mcp-server/src/tools/LighthouseUpdateDatasetTool.ts
apps/mcp-server/src/tools/LighthouseListDatasetsTool.ts
apps/mcp-server/src/tools/LighthouseGetDatasetTool.ts
Add test scaffolding for dataset tools
  • Added dataset-tools.test.ts to set up future unit tests for tool classes
apps/mcp-server/src/__tests__/dataset-tools.test.ts

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • There’s a lot of duplicated logic mapping SDK’s DatasetInfo to your internal Dataset type across the service layer—consider extracting a shared converter to reduce repetition and prevent drift.
  • I noticed you registered create, list, get, and update dataset tools but did not add a deleteDataset tool to the registry—adding that would round out the full CRUD support.
  • You’re generating dataset IDs and version strings inline in multiple methods; pulling that into a shared utility or factory would keep things consistent and make future changes easier.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- There’s a lot of duplicated logic mapping SDK’s DatasetInfo to your internal Dataset type across the service layer—consider extracting a shared converter to reduce repetition and prevent drift.
- I noticed you registered create, list, get, and update dataset tools but did not add a deleteDataset tool to the registry—adding that would round out the full CRUD support.
- You’re generating dataset IDs and version strings inline in multiple methods; pulling that into a shared utility or factory would keep things consistent and make future changes easier.

## Individual Comments

### Comment 1
<location> `apps/mcp-server/src/services/LighthouseService.ts:399-401` </location>
<code_context>
+        id: datasetInfo.id,
+        name: datasetInfo.name,
+        description: datasetInfo.description || "",
+        files: datasetInfo.files.map((hash) => ({
+          cid: hash,
+          size: 0, // Would need to fetch individual file info
+          encrypted: datasetInfo.encrypted,
+          accessConditions: params.accessConditions,
</code_context>

<issue_to_address>
**suggestion:** File size is set to 0 for all dataset files, which may mislead consumers.

If file size is needed, fetch or calculate it during dataset creation. Otherwise, indicate in documentation or types that size is unavailable.

Suggested implementation:

```typescript
        files: datasetInfo.files.map((hash) => ({
          cid: hash,
          // File size is currently unavailable; set to 0 as a placeholder.
          // Update this logic to fetch actual file size if/when available.
          size: 0,
          encrypted: datasetInfo.encrypted,
          accessConditions: params.accessConditions,
          tags: params.tags,
          uploadedAt: datasetInfo.createdAt,
          originalPath: "",
          hash: hash,
        })),

```

If you have a type definition for the file object (e.g., `DatasetFile`), update its documentation to clarify that `size` may be unavailable and is set to 0 as a placeholder. For example:

```typescript
// In types/DatasetFile.ts or similar
/**
 * File size in bytes. May be 0 if unavailable.
 */
size: number;
```

If you later implement logic to fetch the actual file size, replace the placeholder with the real value.
</issue_to_address>

### Comment 2
<location> `apps/mcp-server/src/services/LighthouseService.ts:403` </location>
<code_context>
+          cid: hash,
+          size: 0, // Would need to fetch individual file info
+          encrypted: datasetInfo.encrypted,
+          accessConditions: params.accessConditions,
+          tags: params.tags,
+          uploadedAt: datasetInfo.createdAt,
</code_context>

<issue_to_address>
**suggestion:** Access conditions are passed to the Dataset object but not handled in updateDataset.

If updates to access conditions are needed, add support in updateDataset; otherwise, document that access conditions cannot be changed after creation.

Suggested implementation:

```typescript
/**
 * Updates a dataset.
 * Note: accessConditions can be updated via this function.
 */
async function updateDataset(datasetId: string, updateParams: Partial<Dataset>) {
  // Fetch the existing dataset
  const dataset = await getDatasetById(datasetId);

  // Update allowed fields
  if (updateParams.name !== undefined) dataset.name = updateParams.name;
  if (updateParams.description !== undefined) dataset.description = updateParams.description;
  if (updateParams.tags !== undefined) dataset.tags = updateParams.tags;
  if (updateParams.metadata !== undefined) dataset.metadata = updateParams.metadata;

  // Support updating accessConditions
  if (updateParams.accessConditions !== undefined) {
    dataset.accessConditions = updateParams.accessConditions;
    // If files also need accessConditions updated, update them as well
    if (dataset.files) {
      dataset.files = dataset.files.map(file => ({
        ...file,
        accessConditions: updateParams.accessConditions
      }));
    }
  }

  // Save the updated dataset
  await saveDataset(dataset);
  return dataset;
}

```

```typescript
/**
 * Updates a dataset.
 * Note: accessConditions cannot be changed after creation.
 */
async function updateDataset(datasetId: string, updateParams: Partial<Dataset>) {
  // Fetch the existing dataset
  const dataset = await getDatasetById(datasetId);

  // Update allowed fields
  if (updateParams.name !== undefined) dataset.name = updateParams.name;
  if (updateParams.description !== undefined) dataset.description = updateParams.description;
  if (updateParams.tags !== undefined) dataset.tags = updateParams.tags;
  if (updateParams.metadata !== undefined) dataset.metadata = updateParams.metadata;

  // Do NOT allow updating accessConditions after creation
  // If updateParams.accessConditions is present, ignore it and document this behavior

  // Save the updated dataset
  await saveDataset(dataset);
  return dataset;
}

```

If you choose Option 1, ensure that the `Dataset` type/interface includes `accessConditions` as an updatable field.  
If you choose Option 2, you may want to throw an error or log a warning if `updateParams.accessConditions` is provided, to make the restriction explicit.
</issue_to_address>

### Comment 3
<location> `apps/mcp-server/src/services/MockLighthouseService.ts:479-481` </location>
<code_context>
+
+      // Optionally delete associated files
+      if (deleteFiles) {
+        for (const file of dataset.files) {
+          this.fileStore.delete(file.cid);
+          this.currentStorageSize -= file.size;
+        }
+      }
</code_context>

<issue_to_address>
**issue (bug_risk):** Deleting files from fileStore may result in negative currentStorageSize.

Add a check to ensure currentStorageSize does not drop below zero when deleting files, accounting for possible inaccuracies in file sizes or repeated deletions.
</issue_to_address>

### Comment 4
<location> `apps/mcp-server/src/tools/LighthouseCreateDatasetTool.ts:128` </location>
<code_context>
+      return "filePaths is required and must be a non-empty array";
+    }
+
+    if (params.filePaths.length > 50) {
+      return "Maximum 50 files allowed per dataset";
+    }
</code_context>

<issue_to_address>
**suggestion:** Maximum file count per dataset is hardcoded to 50.

Consider moving this limit to a constant or configuration file if it may change in the future.

Suggested implementation:

```typescript
    const MAX_FILES_PER_DATASET = 50;

    if (params.filePaths.length > MAX_FILES_PER_DATASET) {
      return `Maximum ${MAX_FILES_PER_DATASET} files allowed per dataset`;
    }

```

If you have a central configuration file for such constants, consider moving `MAX_FILES_PER_DATASET` there and importing it instead of defining it locally.
</issue_to_address>

### Comment 5
<location> `apps/mcp-server/src/tools/LighthouseListDatasetsTool.ts:136-152` </location>
<code_context>
+          description: dataset.description,
+          version: dataset.version,
+          fileCount: dataset.files.length,
+          totalSize: dataset.files.reduce((sum, file) => sum + file.size, 0),
+          encrypted: dataset.encrypted,
+          tags: dataset.metadata?.keywords || [],
</code_context>

<issue_to_address>
**suggestion:** Total size calculation may be inaccurate if file sizes are not set.

If file sizes are missing or zero, either exclude totalSize from the response or clearly document its limitations.

```suggestion
        datasets: response.datasets.map((dataset) => {
          // Check if all files have a valid size
          const allFilesHaveValidSize = dataset.files.every(
            (file) => typeof file.size === 'number' && file.size > 0
          );

          const datasetResponse: any = {
            id: dataset.id,
            name: dataset.name,
            description: dataset.description,
            version: dataset.version,
            fileCount: dataset.files.length,
            encrypted: dataset.encrypted,
            tags: dataset.metadata?.keywords || [],
            createdAt: dataset.createdAt.toISOString(),
            updatedAt: dataset.updatedAt.toISOString(),
            metadata: {
              author: dataset.metadata?.author,
              license: dataset.metadata?.license,
              category: dataset.metadata?.category,
            },
          };

          // Only include totalSize if all files have a valid size
          if (allFilesHaveValidSize) {
            // Note: totalSize is only accurate if all file sizes are set and non-zero
            datasetResponse.totalSize = dataset.files.reduce((sum, file) => sum + file.size, 0);
          }

          return datasetResponse;
        }),
```
</issue_to_address>

### Comment 6
<location> `apps/mcp-server/src/__tests__/dataset-tools.test.ts:4` </location>
<code_context>
+/**
+ * Dataset Tools Tests
+ */
+
+import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
+import { Logger } from "@lighthouse-tooling/shared";
</code_context>

<issue_to_address>
**issue (testing):** No test cases found for dataset management tools.

Please add thorough unit and integration tests for all new dataset management tools and service methods, covering core functionality, edge cases, and error handling, especially for file operations, metadata, and access control.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

cid: hash,
size: 0, // Would need to fetch individual file info
encrypted: datasetInfo.encrypted,
accessConditions: params.accessConditions,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Access conditions are passed to the Dataset object but not handled in updateDataset.

If updates to access conditions are needed, add support in updateDataset; otherwise, document that access conditions cannot be changed after creation.

Suggested implementation:

/**
 * Updates a dataset.
 * Note: accessConditions can be updated via this function.
 */
async function updateDataset(datasetId: string, updateParams: Partial<Dataset>) {
  // Fetch the existing dataset
  const dataset = await getDatasetById(datasetId);

  // Update allowed fields
  if (updateParams.name !== undefined) dataset.name = updateParams.name;
  if (updateParams.description !== undefined) dataset.description = updateParams.description;
  if (updateParams.tags !== undefined) dataset.tags = updateParams.tags;
  if (updateParams.metadata !== undefined) dataset.metadata = updateParams.metadata;

  // Support updating accessConditions
  if (updateParams.accessConditions !== undefined) {
    dataset.accessConditions = updateParams.accessConditions;
    // If files also need accessConditions updated, update them as well
    if (dataset.files) {
      dataset.files = dataset.files.map(file => ({
        ...file,
        accessConditions: updateParams.accessConditions
      }));
    }
  }

  // Save the updated dataset
  await saveDataset(dataset);
  return dataset;
}
/**
 * Updates a dataset.
 * Note: accessConditions cannot be changed after creation.
 */
async function updateDataset(datasetId: string, updateParams: Partial<Dataset>) {
  // Fetch the existing dataset
  const dataset = await getDatasetById(datasetId);

  // Update allowed fields
  if (updateParams.name !== undefined) dataset.name = updateParams.name;
  if (updateParams.description !== undefined) dataset.description = updateParams.description;
  if (updateParams.tags !== undefined) dataset.tags = updateParams.tags;
  if (updateParams.metadata !== undefined) dataset.metadata = updateParams.metadata;

  // Do NOT allow updating accessConditions after creation
  // If updateParams.accessConditions is present, ignore it and document this behavior

  // Save the updated dataset
  await saveDataset(dataset);
  return dataset;
}

If you choose Option 1, ensure that the Dataset type/interface includes accessConditions as an updatable field.
If you choose Option 2, you may want to throw an error or log a warning if updateParams.accessConditions is provided, to make the restriction explicit.

Comment on lines +479 to +481
for (const file of dataset.files) {
this.fileStore.delete(file.cid);
this.currentStorageSize -= file.size;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Deleting files from fileStore may result in negative currentStorageSize.

Add a check to ensure currentStorageSize does not drop below zero when deleting files, accounting for possible inaccuracies in file sizes or repeated deletions.

return "filePaths is required and must be a non-empty array";
}

if (params.filePaths.length > 50) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Maximum file count per dataset is hardcoded to 50.

Consider moving this limit to a constant or configuration file if it may change in the future.

Suggested implementation:

    const MAX_FILES_PER_DATASET = 50;

    if (params.filePaths.length > MAX_FILES_PER_DATASET) {
      return `Maximum ${MAX_FILES_PER_DATASET} files allowed per dataset`;
    }

If you have a central configuration file for such constants, consider moving MAX_FILES_PER_DATASET there and importing it instead of defining it locally.

Comment on lines +136 to +152
datasets: response.datasets.map((dataset) => ({
id: dataset.id,
name: dataset.name,
description: dataset.description,
version: dataset.version,
fileCount: dataset.files.length,
totalSize: dataset.files.reduce((sum, file) => sum + file.size, 0),
encrypted: dataset.encrypted,
tags: dataset.metadata?.keywords || [],
createdAt: dataset.createdAt.toISOString(),
updatedAt: dataset.updatedAt.toISOString(),
metadata: {
author: dataset.metadata?.author,
license: dataset.metadata?.license,
category: dataset.metadata?.category,
},
})),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Total size calculation may be inaccurate if file sizes are not set.

If file sizes are missing or zero, either exclude totalSize from the response or clearly document its limitations.

Suggested change
datasets: response.datasets.map((dataset) => ({
id: dataset.id,
name: dataset.name,
description: dataset.description,
version: dataset.version,
fileCount: dataset.files.length,
totalSize: dataset.files.reduce((sum, file) => sum + file.size, 0),
encrypted: dataset.encrypted,
tags: dataset.metadata?.keywords || [],
createdAt: dataset.createdAt.toISOString(),
updatedAt: dataset.updatedAt.toISOString(),
metadata: {
author: dataset.metadata?.author,
license: dataset.metadata?.license,
category: dataset.metadata?.category,
},
})),
datasets: response.datasets.map((dataset) => {
// Check if all files have a valid size
const allFilesHaveValidSize = dataset.files.every(
(file) => typeof file.size === 'number' && file.size > 0
);
const datasetResponse: any = {
id: dataset.id,
name: dataset.name,
description: dataset.description,
version: dataset.version,
fileCount: dataset.files.length,
encrypted: dataset.encrypted,
tags: dataset.metadata?.keywords || [],
createdAt: dataset.createdAt.toISOString(),
updatedAt: dataset.updatedAt.toISOString(),
metadata: {
author: dataset.metadata?.author,
license: dataset.metadata?.license,
category: dataset.metadata?.category,
},
};
// Only include totalSize if all files have a valid size
if (allFilesHaveValidSize) {
// Note: totalSize is only accurate if all file sizes are set and non-zero
datasetResponse.totalSize = dataset.files.reduce((sum, file) => sum + file.size, 0);
}
return datasetResponse;
}),

@Patrick-Ehimen Patrick-Ehimen merged commit 6e8811a into main Oct 27, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant