v2 file processing db schema by eeee0717 · Pull Request #13328 · CherryHQ/cherry-studio

eeee0717 · 2026-03-09T07:57:53Z

@EurFelux 之前讨论的不同的feature有不同的endpoint，我看了paddle的文档，现在有异步解析，多个模型都已经统一url: https://paddleocr.aistudio-app.com/api/v2/ocr/jobs, 只需要修改model即可调用不同的解析模型,目前没看到有其他的服务商是不同endpoint有不同的url

src/main/data/migration/v2/migrators/mappings/FileProcessingOverrideMappings.ts

EurFelux · 2026-03-09T08:02:10Z

Note

This comment was translated by Claude.

Should we consider compatibility with paddle's old version?

Original Content

是否需要考虑兼容paddle旧版本？

eeee0717 · 2026-03-09T08:04:20Z

Should we consider compatibility with paddle's old version?

Note

This comment was translated by Claude.

The keys are consistent between the new and old versions. When migrating, you only need to migrate the keys; there's no need to migrate the URL.

Original Content

新旧版本key是一致的，迁移的时候只需要迁移key，不用迁移url即可

EurFelux · 2026-03-09T08:07:06Z

Note

This comment was translated by Claude.

I mean the old version of the paddleOCR service probably still uses different URLs for different services, right? Of course, we can also only support the new version, which would be more convenient for design.

Original Content

我的意思是paddleOCR服务的旧版本，应该还是给不同的服务配置了不同的URL吧。当然我们也可以只支持新版本，这样设计起来更方便一些。

packages/shared/data/presets/file-processing.ts

eeee0717 · 2026-03-09T08:37:49Z

I mean the old version of the paddleOCR service probably still uses different URLs for different services, right? Of course, we can also only support the new version, which would be more convenient for design.

Note

This comment was translated by Claude.

This doesn't affect anything, the key is the same. Previously, URLs were used to distinguish which model was being used, but now models are distinguished by modelId.

Original Content

这个不影响，key是同一个，原先是用url来区分使用的是什么模型，现在模型用modelId来区分

eeee0717 · 2026-03-12T07:41:17Z

Note

This comment was translated by Claude.

@EurFelux Are there any other places in this design that need to be changed?

Original Content

@EurFelux 这个设计还有需要更改的地方吗

DeJeune · 2026-03-12T13:14:20Z

Note

This comment was translated by Claude.

It needs to be cleaned up a bit, it seems there are many merged commits.

Original Content

需要清理一下，看来有很多合并过的提交

EurFelux · 2026-03-12T13:21:10Z

Note

This comment was translated by Claude.

The issue with this commit 8934fbf is that it introduced many unrelated changes.

Original Content

8934fbf 这个commit的问题，引入了很多无关改动

Signed-off-by: eeee0717 <chentao020717Work@outlook.com>

packages/shared/data/preference/preferenceTypes.ts

DeJeune

Review Summary

Well-structured PR that establishes the file processing data layer for v2. The schema design (templates + overrides), migration logic, and service layer are well thought out. Good test coverage across all new modules.

Critical / Bug (1)

updateProcessor validation order — getPresetById() is called after preferenceService.set(), meaning invalid processor IDs corrupt preferences before the error is thrown.

Significant (2)

Runtime parse at import time — void FileProcessorTemplatesSchema.parse(PRESETS_FILE_PROCESSORS) will crash the app at startup if validation fails. There's already a test for this; consider making the import-time assertion softer or at least documenting the intent.
Merged schema strict mode mismatch — FileProcessorMergedSchema inherits strict capability schemas from the template, but mergeProcessorConfig spreads override fields that aren't in the strict schema. This would fail re-validation.

Minor / Nit (3)

Empty options: {} on every merge — mergeProcessorOverrides always creates an empty options object even when neither side has options.
Metadata schema inconsistency — FileProcessorMetadataSchema (z.never() values) vs CapabilityMetadataSchema (z.unknown() values) is confusing without explanation.
Dual source of truth — Plain TS types in preferenceTypes.ts duplicate Zod schemas in file-processing.ts; consider deriving one from the other.

Positives

Clean separation of concerns: templates (read-only presets) vs overrides (user config) vs merged (API responses)
Thorough migration logic with good edge case handling (paddleocr special-casing, preset default skipping, empty pruning)
Solid test coverage for migration, service, and schema validation
Good use of discriminated unions for capability types
CI is green ✅

DeJeune · 2026-03-14T11:42:46Z

src/main/data/services/FileProcessingService.ts

+  public async updateProcessor(id: FileProcessorId, updates: FileProcessorOverride): Promise<FileProcessorMerged> {
+    const overrides = this.getOverrides()
+    const nextOverrides: FileProcessorOverrides = {
+      ...overrides,
+      [id]: mergeProcessorOverrides(overrides[id], updates)
+    }
+
+    this.getPresetById(id)
+
+    await preferenceService.set('file_processing.overrides', nextOverrides)


Bug: getPresetById(id) is called after preferenceService.set(). If the processor ID doesn't exist in presets, the preference is already written with invalid data before the notFound error is thrown.

Move the validation before the write:

public async updateProcessor(id: FileProcessorId, updates: FileProcessorOverride): Promise<FileProcessorMerged> { this.getPresetById(id) // validate first const overrides = this.getOverrides() const nextOverrides: FileProcessorOverrides = { ...overrides, [id]: mergeProcessorOverrides(overrides[id], updates) } await preferenceService.set('file_processing.overrides', nextOverrides) // ... }

DeJeune · 2026-03-14T11:42:48Z

src/main/data/services/FileProcessingService.ts

+function mergeProcessorOverrides(
+  current?: FileProcessorOverride,
+  updates?: FileProcessorOverride
+): FileProcessorOverride {
+  return {
+    ...current,
+    ...updates,
+    capabilities: mergeCapabilityOverrides(current?.capabilities, updates?.capabilities),
+    options: {
+      ...current?.options,
+      ...updates?.options
+    }
+  }


Nit: mergeProcessorOverrides always produces options: {} even when both current and updates have no options, since spreading two undefineds into an object yields {}. This means every update will persist an empty options field unnecessarily.

Consider guarding:

function mergeProcessorOverrides( current?: FileProcessorOverride, updates?: FileProcessorOverride ): FileProcessorOverride { const mergedOptions = current?.options || updates?.options ? { ...current?.options, ...updates?.options } : undefined return { ...current, ...updates, capabilities: mergeCapabilityOverrides(current?.capabilities, updates?.capabilities), ...(mergedOptions !== undefined && { options: mergedOptions }) } }

DeJeune · 2026-03-14T11:42:50Z

packages/shared/data/presets/file-processing.ts

+export type FileProcessorInput = FeatureCapability['inputs'][number]
+
+/**
+ * Output type


Nit: FileProcessorMetadataSchema uses z.record(z.string(), z.never()) which only allows {}, while CapabilityMetadataSchema on line 113 uses z.record(z.string(), z.unknown()) which allows anything. This inconsistency is confusing — the comment says "reserved for future use" but one schema is locked down and the other is wide open. Consider aligning them or adding a brief comment explaining the intentional difference.

DeJeune · 2026-03-14T11:42:51Z

packages/shared/data/presets/file-processing.ts

+  }
+] as const satisfies readonly FileProcessorTemplate[]
+
+void FileProcessorTemplatesSchema.parse(PRESETS_FILE_PROCESSORS)


Significant: This void FileProcessorTemplatesSchema.parse(...) runs at module import time and will crash the entire app at startup if validation fails. While the intent (catching preset data bugs early) is good, this is risky in production. Consider:

Wrapping in a try/catch that logs the error but doesn't crash, or

Moving this to a test-only assertion (there's already a test for this: 'validates built-in presets'), or

At minimum, adding a comment explaining the intentional crash-on-import behavior so future contributors don't accidentally break startup.

DeJeune · 2026-03-14T11:43:11Z

packages/shared/data/presets/file-processing.ts

+        output: 'markdown',
+        apiHost: 'https://paddleocr.aistudio-app.com/',
+        modelId: 'PaddleOCR-VL-1.5',


Significant: The FileProcessorMergedSchema extends FileProcessorTemplateSchema which inherits the .strict() constraint. However, the merged type adds apiKeys and options via .extend() — that's fine. But the capabilities array still uses the strict FeatureCapabilitySchema from the template.

When mergeProcessorConfig in FileProcessingService spreads override fields into capabilities:

capabilities: preset.capabilities.map((capability) => ({ ...capability, ...override?.capabilities?.[capability.feature] }))

The spread can add extra fields (from CapabilityOverride) that aren't defined in the strict capability schema. This means the merged result would fail if re-validated against FileProcessorMergedSchema. Since you're not re-validating the merged output this works at runtime, but it's a type-level inconsistency worth noting. Consider either:

Relaxing the capability schema in the merged variant, or

Explicitly picking only known fields from the override when merging

DeJeune · 2026-03-14T11:43:12Z

packages/shared/data/preference/preferenceTypes.ts

+export const FILE_PROCESSOR_TYPES = ['api', 'builtin'] as const
+
+export type FileProcessorType = (typeof FILE_PROCESSOR_TYPES)[number]
+
+export const FILE_PROCESSOR_FEATURES = ['text_extraction', 'markdown_conversion'] as const
+
+export type FileProcessorFeature = (typeof FILE_PROCESSOR_FEATURES)[number]
+
+export const FILE_PROCESSOR_IDS = [
+  'tesseract',
+  'system',
+  'paddleocr',
+  'ovocr',
+  'mineru',
+  'doc2x',
+  'mistral',
+  'open-mineru'
+] as const
+
+export type FileProcessorId = (typeof FILE_PROCESSOR_IDS)[number]
+
+export type FileProcessorOptions = Record<string, unknown>
+
+export type CapabilityOverride = {
+  apiHost?: string
+  modelId?: string
+  metadata?: Record<string, unknown>
+}
+
+export type FileProcessorCapabilityOverrides = Partial<Record<FileProcessorFeature, CapabilityOverride>>
+
+export type FileProcessorOverride = {
+  apiKeys?: string[]
+  capabilities?: FileProcessorCapabilityOverrides
+  options?: FileProcessorOptions
+}
+
+export type FileProcessorOverrides = Partial<Record<FileProcessorId, FileProcessorOverride>>


Nit: Agreeing with @EurFelux's earlier comment — since there are already Zod schemas defined in file-processing.ts for these same types (e.g., FileProcessorOverrideSchema, CapabilityOverrideSchema), having plain TypeScript type definitions here creates a dual-source-of-truth risk. If either side drifts, the types will silently diverge. Consider deriving these types from the Zod schemas using `z.infer` (or vice versa) to keep them in sync, similar to how `FileProcessorTemplate` is already done.

feat: add file processing db schema and handler

3a23c7c

eeee0717 requested a review from a team March 9, 2026 07:57

eeee0717 requested a review from 0xfullex as a code owner March 9, 2026 07:57

eeee0717 requested a review from DeJeune March 9, 2026 07:58

eeee0717 changed the title ~~feat: add file processing db schema and handler~~ v2 file processing db schema Mar 9, 2026

eeee0717 added this to the v2.0.0 milestone Mar 9, 2026

eeee0717 added the v2 label Mar 9, 2026

eeee0717 commented Mar 9, 2026

View reviewed changes

src/main/data/migration/v2/migrators/mappings/FileProcessingOverrideMappings.ts Show resolved Hide resolved

EurFelux reviewed Mar 9, 2026

View reviewed changes

fix

daafc72

eeee0717 and others added 3 commits March 12, 2026 15:41

Merge branch 'v2' into v2-fileprocessing-db

a68e921

Merge branch 'v2' into v2-fileprocessing-db

8934fbf

fix: file processing type

b7c7021

chore: record upstream/v2 merge ancestry

c00a413

Signed-off-by: eeee0717 <chentao020717Work@outlook.com>

EurFelux reviewed Mar 12, 2026

View reviewed changes

packages/shared/data/preference/preferenceTypes.ts Show resolved Hide resolved

DeJeune reviewed Mar 14, 2026

View reviewed changes

Conversation

eeee0717 commented Mar 9, 2026

Uh oh!

Uh oh!

EurFelux commented Mar 9, 2026 • edited by kangfenmao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eeee0717 commented Mar 9, 2026 • edited by kangfenmao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EurFelux commented Mar 9, 2026 • edited by kangfenmao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eeee0717 commented Mar 9, 2026 • edited by kangfenmao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eeee0717 commented Mar 12, 2026 • edited by kangfenmao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DeJeune commented Mar 12, 2026 • edited by kangfenmao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EurFelux commented Mar 12, 2026 • edited by kangfenmao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DeJeune left a comment

Choose a reason for hiding this comment

Review Summary

Critical / Bug (1)

Significant (2)

Minor / Nit (3)

Positives

Uh oh!

DeJeune Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

DeJeune Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

DeJeune Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

DeJeune Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

DeJeune Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

DeJeune Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EurFelux commented Mar 9, 2026 •

edited by kangfenmao

Loading

eeee0717 commented Mar 9, 2026 •

edited by kangfenmao

Loading

EurFelux commented Mar 9, 2026 •

edited by kangfenmao

Loading

eeee0717 commented Mar 9, 2026 •

edited by kangfenmao

Loading

eeee0717 commented Mar 12, 2026 •

edited by kangfenmao

Loading

DeJeune commented Mar 12, 2026 •

edited by kangfenmao

Loading

EurFelux commented Mar 12, 2026 •

edited by kangfenmao

Loading