Skip to content

Conversation

@lhuanyu
Copy link
Contributor

@lhuanyu lhuanyu commented Jan 18, 2026

No description provided.

Copilot AI review requested due to automatic review settings January 18, 2026 16:05
@netlify
Copy link

netlify bot commented Jan 18, 2026

Deploy Preview for midscene ready!

Name Link
🔨 Latest commit 035885a
🔍 Latest deploy log https://app.netlify.com/projects/midscene/deploys/696d04c902b55400085320ca
😎 Deploy Preview https://deploy-preview-1810--midscene.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces batch element location capabilities and a short-term memory system to optimize repeated UI operations. The implementation adds locateMulti and locateAll methods to locate multiple elements efficiently, along with a comprehensive short-memory system that caches element positions for fast repeated actions.

Changes:

  • Added locateMulti and locateAll APIs for batch element location with shared AI service calls
  • Implemented ShortMemoryManager with token-based caching for rapid consecutive UI interactions
  • Integrated short-memory support into AI planning prompts and action execution pipeline

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
packages/core/src/types.ts Added new result types for batch locate operations; made ExecutorContext optional in DeviceAction interface
packages/core/src/service/index.ts Implemented locateMulti and locateAll service methods with AI model integration
packages/core/src/device/index.ts Updated defineAction signature to make context parameter optional
packages/core/src/ai-model/prompt/short-memory.ts New prompt instructions for short-memory feature with usage examples
packages/core/src/ai-model/prompt/llm-planning.ts Integrated short-memory instructions into planning prompts
packages/core/src/ai-model/prompt/llm-locator-multi.ts New prompt template for multi-element location
packages/core/src/ai-model/prompt/llm-locator-all.ts New prompt template for locating all matching elements
packages/core/src/ai-model/inspect.ts Implemented AiLocateMultiElements and AiLocateAllElements functions
packages/core/src/ai-model/index.ts Exported new AI locate functions
packages/core/src/agent/ui-utils.ts Added display formatting for LocateMulti task parameters
packages/core/src/agent/task-builder.ts Added task handlers for LocateMulti and LocateAll operations
packages/core/src/agent/short-memory.ts Complete short-memory implementation with manager, actions, and warmup logic
packages/core/src/agent/agent.ts Integrated ShortMemoryManager; added aiLocateAll method with short-memory support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

};
});
}

Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch processing in locateAll uses a hardcoded BATCH_SIZE = 10 without any documentation explaining why this specific value was chosen. Consider adding a comment explaining the rationale (e.g., API rate limits, memory constraints, or performance testing results) or making this configurable through the options parameter.

Suggested change
// Batch size is intentionally kept small to balance latency and model/token usage.
// A value of 10 has shown good performance in typical scenarios without risking
// hitting API rate limits or excessive memory usage when locating many elements.

Copilot uses AI. Check for mistakes.
Comment on lines +1459 to +1477
console.warn(
`aiLocateAll(useShortMemory=true) failed to locate center for token: ${token}`,
);
}
}
} else {
const baseToken =
typeof prompts === 'string' ? prompts : prompts.prompt;
for (let i = 0; i < results.length; i += 1) {
const center = results[i]?.center;
if (Array.isArray(center) && center.length === 2) {
const token = `${baseToken}#${i + 1}`;
points[token] = [Number(center[0]), Number(center[1])];
} else {
console.warn(
`aiLocateAll(useShortMemory=true) failed to locate center for token: ${baseToken}#${i + 1}`,
);
}
}
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the short-memory.ts file, this code also uses console.warn for logging (lines 1459-1461 and 1472-1475) instead of the proper logging mechanism from '@midscene/shared/logger'. Consider using consistent logging throughout the codebase.

Copilot uses AI. Check for mistakes.
call: (param: TParam, context: ExecutorContext) => Promise<TReturn> | TReturn;
call: (
param: TParam,
context?: ExecutorContext,
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the context parameter optional with context?: ExecutorContext could be a breaking change for implementations that expect it to be required. This change appears in the DeviceAction interface which may be implemented by external consumers. Consider whether this should remain required to maintain API stability, or document this as a breaking change if intentional.

Suggested change
context?: ExecutorContext,
context: ExecutorContext,

Copilot uses AI. Check for mistakes.
Comment on lines +375 to +380
const defaultIntervalMs =
param.tokens.length >= 12 ? 200 : param.tokens.length >= 6 ? 150 : 80;
const intervalMs =
typeof param.intervalMs === 'number'
? param.intervalMs
: defaultIntervalMs;
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tapWithShortMemory function has an adaptive interval calculation based on token count (line 375-380), but the thresholds (12 tokens -> 200ms, 6 tokens -> 150ms, default 80ms) are magic numbers without explanation. Consider documenting the rationale for these specific values or extracting them as named constants with explanatory comments.

Copilot uses AI. Check for mistakes.
Comment on lines +1278 to +1336
if (!r) {
return {
rect: undefined,
center: undefined,
} as any;
}
return {
rect: r.rect,
center: r.center,
dpr: dprValue,
};
});
}

const BATCH_SIZE = 10;
const results: any[] = [];

const runBatch = async (batchPrompts: TUserPrompt[]) => {
const detailedParams = batchPrompts.map((p) =>
buildDetailedLocateParam(p, locateOpt),
);

const plan = {
type: 'LocateMulti',
param: detailedParams,
thought: '',
};

const defaultIntentModelConfig =
this.modelConfigManager.getModelConfig('default');
const modelConfigForPlanning =
this.modelConfigManager.getModelConfig('planning');

// Use runPlans to leverage TaskExecutor's pipeline (dump, log, etc)
// Note: we are passing a custom plan type 'LocateMulti', which must be supported by TaskBuilder
const { output } = await this.taskExecutor.runPlans(
`Locate ${batchPrompts.length} elements`,
[plan],
modelConfigForPlanning,
defaultIntentModelConfig,
);

return output;
};

for (let i = 0; i < prompts.length; i += BATCH_SIZE) {
const batch = prompts.slice(i, i + BATCH_SIZE);
const batchResults = await runBatch(batch);
results.push(...batchResults);
}

const dprValue = await (this.interface.size() as any).dpr;
return results.map((r) => {
if (!r) {
return {
rect: undefined,
center: undefined,
} as any;
}
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the return statements for handling null results (lines 1278-1283 and 1332-1336), the return type is cast as any which bypasses type checking. This creates a type safety gap where the function promises to return objects with optional rect/center properties but the actual structure might not match. Consider using a proper type guard or explicit type that matches the return signature.

Copilot uses AI. Check for mistakes.
Comment on lines +248 to +278
let errorLog: string | undefined;
if (parseResult.errors?.length) {
errorLog = `failed to locate elements: \n${parseResult.errors.join('\n')}`;
}

const matchedElements = parseResult.elements.filter(
(e): e is LocateResultElement => e !== null,
);

const dumpData: PartialServiceDumpFromSDK = {
type: 'locate',
userQuery: {
element: JSON.stringify(queryPrompts),
},
matchedElement: matchedElements,
data: null,
taskInfo,
deepThink: false,
error: errorLog,
};

const dump = createServiceDump(dumpData);

if (errorLog) {
throw new ServiceError(errorLog, dump);
}

return {
results: parseResult.elements,
dump,
};
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling is inconsistent between locateMulti and locateAll. In locateMulti, when parseResult.errors has values, the code throws an error (lines 271-272). However, the function still returns the partial results in the error scenario since the throw happens after constructing the dump. Consider whether partial results should be included in the ServiceError or if the function should fail earlier before processing any results.

Copilot uses AI. Check for mistakes.
Comment on lines +672 to +707
console.warn(
`WarmupShortMemory failed to locate center for token: ${token}`,
);
}
}
}

const allTargets = normalizedTargets.filter(
(target) => target.mode === 'all',
);
for (const target of allTargets) {
const results = uiContext
? (
await deps.service.locateAll(
buildDetailedLocateParam(target.prompt) || {
prompt: target.prompt,
},
{ context: uiContext },
modelConfigForDefaultIntent,
)
).results
: await deps.locateAll(target.prompt, {
freezeContext: false,
});
for (let i = 0; i < results.length; i += 1) {
const center = results[i]?.center;
const token = `${target.prompt}#${i + 1}`;
if (Array.isArray(center) && center.length === 2) {
existingPoints[token] = [Number(center[0]), Number(center[1])];
} else {
console.warn(
`WarmupShortMemory failed to locate center for token: ${token}`,
);
}
}
}
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warmupShortMemory function uses console.warn for logging failed locate operations (lines 672-674 and 702-704). This is inconsistent with the codebase pattern which appears to use a logger from '@midscene/shared/logger'. Consider using the proper logging mechanism for consistency.

Copilot uses AI. Check for mistakes.
results.push(...batchResults);
}

const dprValue = await (this.interface.size() as any).dpr;
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate type cast issue: same as line 1276, this line also uses as any to access the dpr property. This pattern is repeated, suggesting a systemic type definition issue that should be addressed at the interface level.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +715
import type { AbstractInterface } from '@/device';
import type { ModelConfigManager } from '@midscene/shared/env';
import { assert } from '@midscene/shared/utils';
import { z } from 'zod';
import { defineAction } from '../device';
import type {
DeviceAction,
LocateOption,
LocateResultElement,
Rect,
TUserPrompt,
UIContext,
} from '../index';
import type { Service } from '../index';
import { buildDetailedLocateParam } from '../yaml/index';

export type ShortMemoryTokenPoint = [number, number];

export type ShortMemoryOpt = LocateOption & {
/**
* Max concurrent locate requests.
* Default: Infinity (fire all in parallel).
*/
concurrency?: number;
/**
* Freeze UI context once, then reuse it for all locate calls.
* Default: true.
*/
freezeContext?: boolean;
/**
* If true, store the locate results into a short-term in-memory map
* so later ai()/aiAction() can reuse it to do fast continuous taps.
*/
useShortMemory?: boolean;
/**
* If true, clear existing shot memory before saving.
* Default: true when useShortMemory is true.
*/
clearShortMemory?: boolean;
};

export class ShortMemoryManager {
private points: Record<string, ShortMemoryTokenPoint> = {};

constructor(
private deps: {
interfaceInstance: AbstractInterface;
service: Service;
modelConfigManager: ModelConfigManager;
locateAll: (
prompts: TUserPrompt | TUserPrompt[],
opt?: LocateOption & { freezeContext?: boolean },
) => Promise<
Array<{ rect?: Rect; center?: [number, number]; dpr?: number }>
>;
getFrozenUIContext: () => UIContext | undefined;
freezePageContext: () => Promise<void>;
unfreezePageContext: () => Promise<void>;
},
) {}

getPoints(): Record<string, ShortMemoryTokenPoint> {
return this.points;
}

setPoints(points: Record<string, ShortMemoryTokenPoint>) {
this.points = { ...points };
}

clearPoints() {
this.points = {};
}

getPointCount(): number {
return Object.keys(this.points).length;
}

buildAiContext(): string | undefined {
return buildShortMemoryAiContext({
points: this.points,
});
}

getActionSpace(): DeviceAction[] {
return [
defineActionTapWithShortMemory({
interfaceInstance: this.deps.interfaceInstance,
getPoints: () => this.points,
}),
defineActionClearShortMemory({
clearPoints: () => this.clearPoints(),
getPointCount: () => this.getPointCount(),
}),
defineActionInputWithShortMemory({
interfaceInstance: this.deps.interfaceInstance,
getPoints: () => this.points,
}),
defineActionHoverWithShortMemory({
interfaceInstance: this.deps.interfaceInstance,
getPoints: () => this.points,
}),
defineActionRightClickWithShortMemory({
interfaceInstance: this.deps.interfaceInstance,
getPoints: () => this.points,
}),
defineActionDoubleClickWithShortMemory({
interfaceInstance: this.deps.interfaceInstance,
getPoints: () => this.points,
}),
defineActionWarmupShortMemory({
warmupShortMemory: (param, context) =>
warmupShortMemory(param, context, {
service: this.deps.service,
modelConfigManager: this.deps.modelConfigManager,
locateAll: this.deps.locateAll,
getFrozenUIContext: this.deps.getFrozenUIContext,
freezePageContext: this.deps.freezePageContext,
unfreezePageContext: this.deps.unfreezePageContext,
getPoints: () => this.points,
setPoints: (points) => this.setPoints(points),
}),
getPointCount: () => this.getPointCount(),
}),
];
}
}

const warmupShortMemoryTargetSchema = z.union([
z.string().describe('Target element description'),
z
.object({
prompt: z.string().describe('Target element description'),
mode: z
.enum(['single', 'all'])
.optional()
.describe(
'single = locate one match; all = locate all matches on screen',
),
})
.describe('Target element with explicit matching mode'),
]);

export const warmupShortMemoryParamSchema = z.object({
targets: z
.array(warmupShortMemoryTargetSchema)
.min(1)
.describe('Targets to pre-locate and cache into short-term memory'),
freezeContext: z
.boolean()
.optional()
.describe('Freeze UI context during warmup. Default is true.'),
clearShortMemory: z
.boolean()
.optional()
.describe('Clear existing shot memory before saving. Default is true.'),
});

export type WarmupShortMemoryParam = z.infer<
typeof warmupShortMemoryParamSchema
>;

export const tapWithShortMemoryParamSchema = z.object({
tokens: z
.array(z.string())
.min(1)
.describe(
'Tokens to tap in order. Example for dial keypad: ["1","3",...].',
),
intervalMs: z
.number()
.int()
.min(0)
.optional()
.describe(
'Delay between taps in ms. Default is adaptive. Use >=1000ms if each tap triggers a network request or the app misses taps.',
),
strict: z
.boolean()
.optional()
.describe(
'If true, throw error when any token is missing. Default is true.',
),
});

export type TapWithShortMemoryParam = z.infer<
typeof tapWithShortMemoryParamSchema
>;

export const clearShortMemoryParamSchema = z.object({
reason: z
.string()
.optional()
.describe('Why ShortMemory is being cleared (optional).'),
});

export type ClearShortMemoryParam = z.infer<typeof clearShortMemoryParamSchema>;

export const hoverWithShortMemoryParamSchema = z.object({
token: z
.string()
.describe('Token for the target (must exist in ShortMemory).'),
});

export type HoverWithShortMemoryParam = z.infer<
typeof hoverWithShortMemoryParamSchema
>;

export const rightClickWithShortMemoryParamSchema = z.object({
token: z
.string()
.describe('Token for the target (must exist in ShortMemory).'),
});

export type RightClickWithShortMemoryParam = z.infer<
typeof rightClickWithShortMemoryParamSchema
>;

export const doubleClickWithShortMemoryParamSchema = z.object({
token: z
.string()
.describe('Token for the target (must exist in ShortMemory).'),
});

export type DoubleClickWithShortMemoryParam = z.infer<
typeof doubleClickWithShortMemoryParamSchema
>;

export const inputWithShortMemoryParamSchema = z.object({
token: z
.string()
.describe('Token for the input field (must exist in ShortMemory).'),
value: z
.union([z.string(), z.number()])
.transform((val) => String(val))
.describe(
'The text to input. Provide the final content for replace/append modes, or an empty string when using clear mode to remove existing text.',
),
mode: z
.enum(['replace', 'clear', 'append'])
.default('replace')
.optional()
.describe(
'Input mode: "replace" (default) - clear the field and input the value; "append" - append the value to existing content; "clear" - clear the field without inputting new text.',
),
});

export type InputWithShortMemoryParam = z.infer<
typeof inputWithShortMemoryParamSchema
>;

export type WarmupShortMemoryDeps = {
service: Service;
modelConfigManager: ModelConfigManager;
locateAll: (
prompts: TUserPrompt | TUserPrompt[],
opt?: LocateOption & { freezeContext?: boolean },
) => Promise<Array<{ rect?: Rect; center?: [number, number]; dpr?: number }>>;
getFrozenUIContext: () => UIContext | undefined;
freezePageContext: () => Promise<void>;
unfreezePageContext: () => Promise<void>;
getPoints: () => Record<string, ShortMemoryTokenPoint>;
setPoints: (points: Record<string, ShortMemoryTokenPoint>) => void;
};

export function buildShortMemoryAiContext(options: {
points: Record<string, ShortMemoryTokenPoint>;
}): string | undefined {
const tokens = Object.keys(options.points);
if (tokens.length === 0) {
return undefined;
}

const baseTokenGroups = new Map<string, number[]>();
const plainTokens: string[] = [];
for (const token of tokens) {
const match = token.match(/^(.*)#(\d+)$/);
if (match) {
const base = match[1];
const index = Number(match[2]);
const list = baseTokenGroups.get(base) || [];
list.push(index);
baseTokenGroups.set(base, list);
} else {
plainTokens.push(token);
}
}

const groupSummaries = Array.from(baseTokenGroups.entries()).map(
([base, indices]) => {
const sorted = indices.sort((a, b) => a - b);
const first = sorted[0];
const last = sorted[sorted.length - 1];
const range = first === last ? `#${first}` : `#${first}..#${last}`;
return `${base}${range} (count ${sorted.length})`;
},
);
const allTokensLine = tokens.join(', ');

return [
'[ShortMemory] I have pre-located and cached a set of targets (valid only for the current screen/context).',
groupSummaries.length
? `Token groups: ${groupSummaries.join('; ')}`
: undefined,
plainTokens.length
? `Ungrouped tokens: ${plainTokens.join(', ')}`
: undefined,
`All tokens (full list): ${allTokensLine}`,
'Tokens are literal identifiers for elements already located in the current context. Reuse them verbatim and do not invent new ones.',
'If tokens share a base name (e.g. "Follow#1", "Follow#2"), they are multiple matches of the same target type.',
'If the instruction requests "all"/"every" of a target type, include ALL matching tokens rather than picking a single one.',
'For network-bound flows (e.g. each tap triggers a request), set TapWithShortMemory.intervalMs >= 1000ms; otherwise choose a reasonable interval.',
'When the instruction can be satisfied by these tokens, prefer token-based actions (TapWithShortMemory / InputWithShortMemory) over re-locating.',
]
.filter(Boolean)
.join('\n');
}

export function defineActionWarmupShortMemory(options: {
warmupShortMemory: (
param: WarmupShortMemoryParam,
context?: { uiContext?: UIContext },
) => Promise<void>;
getPointCount: () => number;
}): DeviceAction<WarmupShortMemoryParam, { count: number }> {
return defineAction<typeof warmupShortMemoryParamSchema>({
name: 'WarmupShortMemory',
description:
'Proactively locate and cache potential targets into short-term memory for later actions on the current screen.',
interfaceAlias: 'warmupShortMemory',
paramSchema: warmupShortMemoryParamSchema,
call: async (param, context) => {
await options.warmupShortMemory(param, {
uiContext: context?.uiContext,
});
return {
count: options.getPointCount(),
};
},
delayAfterRunner: 0,
});
}

export function defineActionClearShortMemory(options: {
clearPoints: () => void;
getPointCount: () => number;
}): DeviceAction<ClearShortMemoryParam, { cleared: number }> {
return defineAction<typeof clearShortMemoryParamSchema>({
name: 'ClearShortMemory',
description:
'Clear short-term memory tokens when the UI context has changed or tokens are no longer valid.',
interfaceAlias: 'clearShortMemory',
paramSchema: clearShortMemoryParamSchema,
call: async () => {
const cleared = options.getPointCount();
options.clearPoints();
return {
cleared,
};
},
delayAfterRunner: 0,
});
}

export function defineActionTapWithShortMemory(options: {
interfaceInstance: AbstractInterface;
getPoints: () => Record<string, ShortMemoryTokenPoint>;
}): DeviceAction<TapWithShortMemoryParam, { count: number }> {
return defineAction<typeof tapWithShortMemoryParamSchema>({
name: 'TapWithShortMemory',
description:
'Tap a sequence of pre-located tokens (no locate step). Use this for fast continuous taps like dialer keypad. Tokens must be preloaded by the caller.',
interfaceAlias: 'aiTapWithShortMemory',
paramSchema: tapWithShortMemoryParamSchema,
call: async (param, context) => {
const defaultIntervalMs =
param.tokens.length >= 12 ? 200 : param.tokens.length >= 6 ? 150 : 80;
const intervalMs =
typeof param.intervalMs === 'number'
? param.intervalMs
: defaultIntervalMs;
const strict = param.strict !== undefined ? param.strict : true;

const tapAction = options.interfaceInstance
.actionSpace()
.find((action) => action.name === 'Tap');
assert(tapAction, 'Tap action not found in interface action space');

const points = options.getPoints();
for (const token of param.tokens) {
const point = points[token];
if (!point) {
if (strict) {
throw new Error(
`TapWithShortMemory: token not found: ${token}. Did you preload token->point mapping?`,
);
}
continue;
}

const locate = {
center: [point[0], point[1]] as [number, number],
rect: {
left: point[0],
top: point[1],
width: 1,
height: 1,
},
} as LocateResultElement;

await tapAction.call({ locate }, context);

if (intervalMs > 0) {
await new Promise((r) => setTimeout(r, intervalMs));
}
}

return {
count: param.tokens.length,
};
},
delayAfterRunner: 0,
});
}

export function defineActionInputWithShortMemory(options: {
interfaceInstance: AbstractInterface;
getPoints: () => Record<string, ShortMemoryTokenPoint>;
}): DeviceAction<InputWithShortMemoryParam, { token: string }> {
return defineAction<typeof inputWithShortMemoryParamSchema>({
name: 'InputWithShortMemory',
description:
'Input text into a pre-located field token (no locate step). Tokens must be preloaded by the caller.',
interfaceAlias: 'aiInputWithShortMemory',
paramSchema: inputWithShortMemoryParamSchema,
call: async (param, context) => {
const inputAction = options.interfaceInstance
.actionSpace()
.find((action) => action.name === 'Input');
assert(inputAction, 'Input action not found in interface action space');

const points = options.getPoints();
const point = points[param.token];
if (!point) {
throw new Error(
`InputWithShortMemory: token not found: ${param.token}. Did you preload token->point mapping?`,
);
}

const locate = {
center: [point[0], point[1]] as [number, number],
rect: {
left: point[0],
top: point[1],
width: 1,
height: 1,
},
} as LocateResultElement;

await inputAction.call(
{
locate,
value: param.value,
mode: param.mode,
},
context,
);

return {
token: param.token,
};
},
delayAfterRunner: 0,
});
}

export function defineActionHoverWithShortMemory(options: {
interfaceInstance: AbstractInterface;
getPoints: () => Record<string, ShortMemoryTokenPoint>;
}): DeviceAction<HoverWithShortMemoryParam, { token: string }> {
return defineAction<typeof hoverWithShortMemoryParamSchema>({
name: 'HoverWithShortMemory',
description:
'Hover on a pre-located token (no locate step). Tokens must be preloaded by the caller.',
interfaceAlias: 'aiHoverWithShortMemory',
paramSchema: hoverWithShortMemoryParamSchema,
call: async (param, context) => {
const hoverAction = options.interfaceInstance
.actionSpace()
.find((action) => action.name === 'Hover');
assert(hoverAction, 'Hover action not found in interface action space');

const points = options.getPoints();
const point = points[param.token];
if (!point) {
throw new Error(
`HoverWithShortMemory: token not found: ${param.token}. Did you preload token->point mapping?`,
);
}

const locate = {
center: [point[0], point[1]] as [number, number],
rect: {
left: point[0],
top: point[1],
width: 1,
height: 1,
},
} as LocateResultElement;

await hoverAction.call({ locate }, context);

return {
token: param.token,
};
},
delayAfterRunner: 0,
});
}

export function defineActionRightClickWithShortMemory(options: {
interfaceInstance: AbstractInterface;
getPoints: () => Record<string, ShortMemoryTokenPoint>;
}): DeviceAction<RightClickWithShortMemoryParam, { token: string }> {
return defineAction<typeof rightClickWithShortMemoryParamSchema>({
name: 'RightClickWithShortMemory',
description:
'Right click a pre-located token (no locate step). Tokens must be preloaded by the caller.',
interfaceAlias: 'aiRightClickWithShortMemory',
paramSchema: rightClickWithShortMemoryParamSchema,
call: async (param, context) => {
const rightClickAction = options.interfaceInstance
.actionSpace()
.find((action) => action.name === 'RightClick');
assert(
rightClickAction,
'RightClick action not found in interface action space',
);

const points = options.getPoints();
const point = points[param.token];
if (!point) {
throw new Error(
`RightClickWithShortMemory: token not found: ${param.token}. Did you preload token->point mapping?`,
);
}

const locate = {
center: [point[0], point[1]] as [number, number],
rect: {
left: point[0],
top: point[1],
width: 1,
height: 1,
},
} as LocateResultElement;

await rightClickAction.call({ locate }, context);

return {
token: param.token,
};
},
delayAfterRunner: 0,
});
}

export function defineActionDoubleClickWithShortMemory(options: {
interfaceInstance: AbstractInterface;
getPoints: () => Record<string, ShortMemoryTokenPoint>;
}): DeviceAction<DoubleClickWithShortMemoryParam, { token: string }> {
return defineAction<typeof doubleClickWithShortMemoryParamSchema>({
name: 'DoubleClickWithShortMemory',
description:
'Double click a pre-located token (no locate step). Tokens must be preloaded by the caller.',
interfaceAlias: 'aiDoubleClickWithShortMemory',
paramSchema: doubleClickWithShortMemoryParamSchema,
call: async (param, context) => {
const doubleClickAction = options.interfaceInstance
.actionSpace()
.find((action) => action.name === 'DoubleClick');
assert(
doubleClickAction,
'DoubleClick action not found in interface action space',
);

const points = options.getPoints();
const point = points[param.token];
if (!point) {
throw new Error(
`DoubleClickWithShortMemory: token not found: ${param.token}. Did you preload token->point mapping?`,
);
}

const locate = {
center: [point[0], point[1]] as [number, number],
rect: {
left: point[0],
top: point[1],
width: 1,
height: 1,
},
} as LocateResultElement;

await doubleClickAction.call({ locate }, context);

return {
token: param.token,
};
},
delayAfterRunner: 0,
});
}

export async function warmupShortMemory(
param: WarmupShortMemoryParam,
options: { uiContext?: UIContext } | undefined,
deps: WarmupShortMemoryDeps,
): Promise<void> {
const targets = param.targets || [];
const freezeContext = param.freezeContext ?? true;
const clearShortMemory = param.clearShortMemory ?? true;
const uiContext = options?.uiContext;
const shouldFreezeContext = freezeContext && !uiContext;
const modelConfigForDefaultIntent =
deps.modelConfigManager.getModelConfig('default');

if (targets.length === 0) {
return;
}

const existingPoints = clearShortMemory ? {} : { ...deps.getPoints() };

const normalizedTargets = targets.map((target) =>
typeof target === 'string'
? { prompt: target, mode: 'single' as const }
: {
prompt: target.prompt,
mode: target.mode ?? 'single',
},
);

const alreadyFrozen = Boolean(deps.getFrozenUIContext());
if (shouldFreezeContext && !alreadyFrozen) {
await deps.freezePageContext();
}

try {
const singleTargets = normalizedTargets.filter(
(target) => target.mode !== 'all',
);
if (singleTargets.length) {
const prompts = singleTargets.map((target) => target.prompt);
const results = uiContext
? (
await deps.service.locateMulti(
prompts.map(
(prompt) => buildDetailedLocateParam(prompt) || { prompt },
),
{ context: uiContext },
modelConfigForDefaultIntent,
)
).results
: await deps.locateAll(prompts, {
freezeContext: false,
});
for (let i = 0; i < prompts.length; i += 1) {
const token = prompts[i];
const center = results[i]?.center;
if (Array.isArray(center) && center.length === 2) {
existingPoints[token] = [Number(center[0]), Number(center[1])];
} else {
console.warn(
`WarmupShortMemory failed to locate center for token: ${token}`,
);
}
}
}

const allTargets = normalizedTargets.filter(
(target) => target.mode === 'all',
);
for (const target of allTargets) {
const results = uiContext
? (
await deps.service.locateAll(
buildDetailedLocateParam(target.prompt) || {
prompt: target.prompt,
},
{ context: uiContext },
modelConfigForDefaultIntent,
)
).results
: await deps.locateAll(target.prompt, {
freezeContext: false,
});
for (let i = 0; i < results.length; i += 1) {
const center = results[i]?.center;
const token = `${target.prompt}#${i + 1}`;
if (Array.isArray(center) && center.length === 2) {
existingPoints[token] = [Number(center[0]), Number(center[1])];
} else {
console.warn(
`WarmupShortMemory failed to locate center for token: ${token}`,
);
}
}
}
} finally {
if (shouldFreezeContext && !alreadyFrozen) {
await deps.unfreezePageContext();
}
}

deps.setPoints(existingPoints);
}
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire short memory implementation (715 lines of new code) lacks test coverage. This includes critical functionality like ShortMemoryManager, token management, warmup logic, and various action handlers. Given the complexity and the repository's comprehensive test suite, this module should have extensive unit tests covering token storage/retrieval, action definitions, error cases, and edge scenarios like frozen contexts and concurrent operations.

Copilot uses AI. Check for mistakes.
// Convert value to string to ensure consistency
const stringValue = typeof value === 'number' ? String(value) : value;

const { useShortMemory, ...optWithoutShortMemory } = (opt || {}) as {
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of variable 'opt' always evaluates to true.

Suggested change
const { useShortMemory, ...optWithoutShortMemory } = (opt || {}) as {
const { useShortMemory, ...optWithoutShortMemory } = opt as {

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant