Skip to content

Conversation

@quanru
Copy link
Collaborator

@quanru quanru commented Jan 15, 2026

Summary

This PR implements a clean refactoring to integrate StorageProvider abstraction and directory-based report functionality. All changes follow an incremental approach, preserving backward compatibility.

Key Changes

1. StorageProvider Abstraction (Stage 1)

  • Added StorageProvider interface for screenshot storage
  • Implemented MemoryStorage (default, in-memory)
  • Implemented FileStorage (file-based for Node.js)

2. Async ScreenshotItem (Stage 2)

BREAKING CHANGES:

  • ScreenshotItem.create() is now async → Promise<ScreenshotItem>
  • ScreenshotItem.getData() is now async → Promise<string>
  • Serialization format changed: string{ $screenshot: "id" }

New methods:

  • migrateTo(): Migrate between storage providers
  • restore(): Deserialize from { $screenshot: id }
  • isSerialized(): Check if value is serialized format

3. ExecutionDump Extension (Stage 3)

Incrementally extended with new methods while preserving all existing ones:

  • collectScreenshots(): Collect all screenshot items
  • toSerializableFormat(): Async conversion with new format

4. GroupedActionDump Extension (Stage 4)

Incrementally extended with new methods while preserving all existing ones:

  • collectAllScreenshots(): Collect screenshots from all executions
  • toHTML(): Generate HTML with serialized screenshots
  • writeToDirectory(): Write report as directory with separate PNG files

Added ReportWriter utility class for flexible report generation.

5. Agent Integration (Stage 5)

  • Added AgentOpt.useDirectoryReport option (default: false)
  • Added AgentOpt.storageProvider option
  • Updated writeOutActionDumps() to support both report modes
  • Used fire-and-forget pattern for async operations

6. Async Call Fixes (Stage 6)

Fixed all ScreenshotItem.create() and getData() calls across:

  • 8 source files in @midscene/core
  • 6 test files (including complete rewrite of screenshot-item.test.ts)

7. Integration Package Fixes (Stage 7)

Fixed async API compatibility issues in downstream packages:

@midscene/web-integration:

  • Modified WebUIContext type to make screenshot optional
  • Added lazy loading support for ScreenshotItem in StaticPage
  • Added screenshotBase64 field for synchronous initialization
  • Fixed async getData() calls with await

@midscene/visualizer:

  • Made generateAnimationScripts() and allScriptsFromDump() async
  • Added await to all getData() calls (7 locations in replay-scripts.ts)
  • Converted forEach loops to for...of loops to support await
  • Used React hooks (useState + useEffect) for async data loading in components
  • Fixed async calls in usePlaygroundExecution hook

Test fixes:

  • Added await to fakeService() call in task-runner tests
  • Made test beforeEach async where needed
  • Added await to all ScreenshotItem.create() and getData() calls in tests

Architecture Decisions

  1. Incremental Extension: Extended existing classes instead of replacing them
  2. Backward Compatibility: All original methods preserved
  3. Avoided Unnecessary Changes: No import style modifications or refactoring
  4. Dynamic Imports: Used await import() to avoid bundling Node modules
  5. Fire-and-Forget Pattern: Used void promise.catch() for non-blocking async
  6. Optional Screenshot Field: Made screenshot optional in WebUIContext to support both formats
  7. React Hooks for Async: Used useState + useEffect pattern for async data in React components

Test Results

✅ All builds passing
✅ @midscene/core: All tests passing
✅ @midscene/web: All tests passing
✅ @midscene/visualizer: Build successful

Breaking Changes

  • ScreenshotItem.create() requires await
  • ScreenshotItem.getData() requires await
  • Serialization format changed (handled by new deserializer)

Migration Guide

For existing code using ScreenshotItem:

// Before
const item = ScreenshotItem.create(base64);
const data = item.getData();

// After
const item = await ScreenshotItem.create(base64);
const data = await item.getData();

For React components using screenshot data:

// Before
const screenshotBase64 = screenshot.getData();

// After - use React hooks
const [screenshotBase64, setScreenshotBase64] = useState<string>('');
useEffect(() => {
  screenshot.getData().then(setScreenshotBase64);
}, [screenshot]);

Commits Summary

Core implementation (9 commits):

  1. Add StorageProvider abstraction
  2. Make ScreenshotItem async with provider support
  3. Extend ExecutionDump with serialization methods
  4. Extend GroupedActionDump with directory report support
  5. Integrate directory reports into Agent
  6. Fix async calls in @midscene/core source files
  7. Fix async calls in @midscene/core tests
  8. Commit message translation to English
  9. Linter fixes

Integration fixes (4 commits):

  1. Fix @midscene/web-integration async API compatibility
  2. Fix @midscene/visualizer async getData() calls
  3. Fix visualizer for loop syntax
  4. Fix test async calls

Total: 13 commits, 27 files changed, 840 additions, 183 deletions

Related Issues

This PR continues the work from feat/local-imgs branch with a clean, incremental approach based on the main branch.

@netlify
Copy link

netlify bot commented Jan 15, 2026

Deploy Preview for midscene failed. Why did it fail? →

Name Link
🔨 Latest commit 726b3d6
🔍 Latest deploy log https://app.netlify.com/projects/midscene/deploys/6969f675ae8462000848fdbb

@quanru quanru force-pushed the refact/integration-v2 branch 3 times, most recently from d11a1c6 to a579af4 Compare January 15, 2026 14:54
quanru added 15 commits January 16, 2026 16:16
Added storage abstraction layer to support different screenshot storage
strategies (memory vs file system).

New files:
- packages/core/src/storage/provider.ts - StorageProvider interface
- packages/core/src/storage/memory.ts - In-memory storage implementation
- packages/core/src/storage/file.ts - File-based storage for Node.js
- packages/core/src/storage/index.ts - Exports

Updated:
- packages/core/src/index.ts - Export StorageProvider types

This is the foundation for async ScreenshotItem and directory-based
reports in subsequent commits.
…ovider support

BREAKING CHANGE: ScreenshotItem.create() is now async and returns Promise<ScreenshotItem>
BREAKING CHANGE: ScreenshotItem.getData() is now async and returns Promise<string>
BREAKING CHANGE: toSerializable() now returns { $screenshot: string } instead of string

Changes:
- Add StorageProvider support (defaults to MemoryStorage)
- Convert create() to async static method
- Convert getData() to async method for on-demand loading
- Add migrateTo() method for changing storage providers
- Update serialization format to { $screenshot: "id" }
- Add isSerialized() to replace isSerializedData()
- Add restore() static method for deserialization

Note: This commit introduces compilation errors in:
- agent.ts: Missing await on ScreenshotItem.create()
- agent/utils.ts: Missing await on ScreenshotItem.create()
- types.ts: Using old API (isSerializedData, fromSerializedData)

These will be fixed in subsequent commits (Stage 6).
…tion

Extended ExecutionDump class in types.ts with new methods while preserving
all existing functionality for backward compatibility.

New methods added:
- collectScreenshots(): Collect all ScreenshotItem instances from tasks
- toSerializableFormat(): Convert to format with { $screenshot: id } placeholders

Updated:
- reviverForDumpDeserialization: Support new ScreenshotItem serialization format
  ({ $screenshot: "id" } instead of base64 string)

All original methods preserved:
- constructor(), serialize(), toJSON(), fromSerializedString(), fromJSON()

Note: Compilation errors in agent.ts, ai-model/*.ts are expected and will be
fixed in Stage 6 when all async getData() calls are updated.
…ort support

Extended GroupedActionDump class in types.ts with new methods while preserving
all existing functionality for backward compatibility.

New properties added:
- private _storageProvider: StorageProvider for screenshot storage
- storageProvider getter

New methods added:
- collectAllScreenshots(): Collect screenshots from all executions
- toHTML(): Convert to HTML format with serialized screenshots
- writeToDirectory(): Write report to directory with separate PNG files

New files:
- packages/core/src/report-writer.ts: ReportWriter utility class

Updated constructor:
- Now accepts optional storageProvider parameter

All original methods preserved:
- constructor(), serialize(), toJSON(), fromSerializedString(), fromJSON()

The writeToDirectory() method creates a directory structure:
  outputDir/
    ├── index.html
    └── screenshots/
        ├── {id1}.png
        ├── {id2}.png
        └── ...

This enables directory-based reports that reduce memory usage by storing
screenshots as separate files instead of embedded base64 in HTML.
…ntOpt

Extended AgentOpt interface with two new options:

1. useDirectoryReport: boolean (default: false)
   - Enables directory-based report format with separate PNG files
   - Reduces memory usage by avoiding base64 embedding
   - Requires HTTP server to view (CORS restrictions with file://)

2. storageProvider: StorageProvider (optional)
   - Allows custom screenshot storage implementation
   - Defaults to MemoryStorage if not specified

These options will be used in Agent class to support directory-based
reports in the next commit.
Stage 5 completed: Integrated useDirectoryReport functionality in Agent class

Changes:
1. resetDump() - Pass storageProvider to GroupedActionDump constructor
2. writeOutActionDumps() - Made async, supports two report modes:
   - useDirectoryReport=true: Calls dump.writeToDirectory() for directory format
   - useDirectoryReport=false: Uses traditional writeLogFile() for single file
3. onTaskUpdate callback (line 400) - Use fire-and-forget pattern for async call
4. recordToReport() method (line 1423) - Use await for async call

Note:
- Expected compilation errors exist (missing await on ScreenshotItem.create() and getData())
- These will be fixed in Stage 6
Stage 6 completed: Fixed all ScreenshotItem async calls

Modified source files (8 files):
- agent/agent.ts: Added await to 3 create() and 2 getData() calls
- agent/utils.ts: Added await to create()
- service/index.ts: Added await to getData()
- ai-model/ui-tars-planning.ts: Added await to getData()
- ai-model/llm-planning.ts: Added await to getData()
- ai-model/inspect.ts: Added await to 3 getData() calls

Modified test files (6 files):
- tests/utils.ts: Made fakeService async
- tests/unit-test/tasks-null-data.test.ts: Made helper functions async
- tests/unit-test/task-runner/index.test.ts: Made helper functions async
- tests/unit-test/screenshot-item.test.ts: Complete rewrite for new API
- tests/unit-test/bbox-locate-cache.test.ts: Made helper functions async
- tests/unit-test/aiaction-cacheable.test.ts: Used mockImplementation

All compilation errors should be fixed.
Fixed incorrect import of GroupedActionDump from './dump', should import from './types' instead.
Modified WebUIContext type and getContext() to support both screenshot formats:
- screenshot (ScreenshotItem) - new format
- screenshotBase64 (string) - legacy format

Changes:
1. WebUIContext: Made screenshot field optional using Omit<UIContext, 'screenshot'>
2. getContext(): Lazily create ScreenshotItem from screenshotBase64 if needed
3. bin.ts: Use screenshotBase64 field for empty screenshot
4. mcp-tools.ts: Keep createTemporaryDevice() synchronous with screenshotBase64
5. static-page.ts: Added await to screenshot.getData()

This maintains backward compatibility while supporting the new async ScreenshotItem API.
- Make generateAnimationScripts async and return Promise
- Add await to all getData() calls in replay-scripts.ts
- Make allScriptsFromDump async with Promise return type
- Convert forEach to for...of loop to support await
- Use useState + useEffect for async data in blackboard
- Add await to allScriptsFromDump calls in usePlaygroundExecution
- Add await to fakeService call in task-runner test
- Make beforeEach async in freeze-context test
- Add await to ScreenshotItem.create calls
- Add await to getData call in test assertion
- JSON.stringify dumpData before passing to generateDumpScriptTag
- Fixes 'html.replace is not a function' error when writing directory reports
- Add inlineScreenshots option to toSerializableFormat
- In directory report mode, embed base64 directly in dump data
- Generate image script tags with base64 for fallback
- Fixes 'MemoryStorage: Data not found' error in reports
@quanru quanru force-pushed the refact/integration-v2 branch from 1447e3c to de8b96c Compare January 16, 2026 08:18
- Replace all .base64 property access with await getData()
- Add await to ScreenshotItem.create() calls
- Remove SerializedScreenshotItem export (no longer exists)
- Fixes in files:
  - agent/agent.ts (3 locations)
  - ai-model/auto-glm/planning.ts
  - ai-model/inspect.ts (3 locations)
  - ai-model/llm-planning.ts
  - ai-model/ui-tars-planning.ts
  - service/index.ts
  - index.ts (remove type export)
- Fix WebUIContext type: make screenshot optional using Omit
- Add lazy loading of ScreenshotItem in getContext()
- Replace .base64 with await getData() in screenshotBase64()
- Cast return type as UIContext after ensuring screenshot exists
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants