Skip to content

feat(workflow,llm,playwright,puppeteer): and longPress and swipe for playwright and puppeteer #1036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions packages/core/src/ai-model/common.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@ import type {
MidsceneYamlFlowItem,
PlanningAction,
PlanningActionParamInputOrKeyPress,
PlanningActionParamLongPress,
PlanningActionParamScroll,
PlanningActionParamSleep,
PlanningActionParamSwipe,
Rect,
Size,
} from '@/types';
Expand Down Expand Up @@ -363,6 +365,24 @@ export function buildYamlFlowFromPlans(
scrollType: param.scrollType,
distance: param.distance,
});
} else if (type === 'Swipe') {
const param = plan.param as PlanningActionParamSwipe;
flow.push({
aiSwipe: null,
locate,
from: param.from,
to: param.to,
duration: param.duration,
direction: param.direction,
swipeType: param.swipeType,
distance: param.distance,
});
} else if (type === 'LongPress') {
const param = plan.param as PlanningActionParamLongPress;
flow.push({
aiLongPress: locate!,
duration: param.duration,
});
} else if (type === 'Sleep') {
const param = plan.param as PlanningActionParamSleep;
flow.push({
Expand Down
74 changes: 71 additions & 3 deletions packages/core/src/ai-model/prompt/llm-planning.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Target: User will give you a screenshot, an instruction and some previous logs i

Restriction:
- Don't give extra actions or plans beyond the instruction. ONLY plan for what the instruction requires. For example, don't try to submit the form if the instruction is only to fill something.
- Always give ONLY ONE action in \`log\` field (or null if no action should be done), instead of multiple actions. Supported actions are Tap, Hover, Input, KeyboardPress, Scroll${pageType === 'android' ? ', AndroidBackButton, AndroidHomeButton, AndroidRecentAppsButton, AndroidLongPress, AndroidPull.' : '.'}
- Always give ONLY ONE action in \`log\` field (or null if no action should be done), instead of multiple actions. Supported actions are Tap, Hover, Input, KeyboardPress, Scroll, LongPress, Swipe${pageType === 'android' ? ', AndroidBackButton, AndroidHomeButton, AndroidRecentAppsButton, AndroidLongPress, AndroidPull.' : '.'}
- Don't repeat actions in the previous logs.
- Bbox is the bounding box of the element to be located. It's an array of 4 numbers, representing ${bboxDescription(vlMode)}.

Expand All @@ -35,6 +35,8 @@ Supporting actions:
- Hover: { type: "Hover", ${vlLocateParam} }
- Input: { type: "Input", ${vlLocateParam}, param: { value: string } } // Replace the input field with a new value. \`value\` is the final that should be filled in the input box. No matter what modifications are required, just provide the final value to replace the existing input value. Giving a blank string means clear the input field.
- KeyboardPress: { type: "KeyboardPress", param: { value: string } }
- LongPress: { type: "LongPress", ${vlLocateParam}, param: { duration?: number(ms) } }
- Swipe: { type: "Swipe", ${vlLocateParam} | null, param: { from?: { x: number, y: number }, to?: { x: number, y: number }, duration?: number(ms), direction: 'down' | 'up' | 'right' | 'left'(default), swipeType: 'once' (default) | 'untilBottom' | 'untilTop' | 'untilRight' | 'untilLeft', distance?: number } } // locate is the element to swipe. For a page-level swipe, set \`locate\` to \`null\`.
- Scroll: { type: "Scroll", ${vlLocateParam} | null, param: { direction: 'down'(default) | 'up' | 'right' | 'left', scrollType: 'once' (default) | 'untilBottom' | 'untilTop' | 'untilRight' | 'untilLeft', distance: null | number }} // locate is the element to scroll. If it's a page scroll, put \`null\` in the \`locate\` field.
${
pageType === 'android'
Expand Down Expand Up @@ -95,7 +97,7 @@ You are a versatile professional in software UI automation. Your outstanding con
## Workflow

1. Receive the screenshot, element description of screenshot(if any), user's instruction and previous logs.
2. Decompose the user's task into a sequence of actions, and place it in the \`actions\` field. There are different types of actions (Tap / Hover / Input / KeyboardPress / Scroll / FalsyConditionStatement / Sleep ${pageType === 'android' ? '/ AndroidBackButton / AndroidHomeButton / AndroidRecentAppsButton / AndroidLongPress / AndroidPull' : ''}). The "About the action" section below will give you more details.
2. Decompose the user's task into a sequence of actions, and place it in the \`actions\` field. There are different types of actions (Tap / Hover / Input / KeyboardPress / Scroll / LongPress / Swipe / FalsyConditionStatement / Sleep ${pageType === 'android' ? '/ AndroidBackButton / AndroidHomeButton / AndroidRecentAppsButton / AndroidLongPress / AndroidPull' : ''}). The "About the action" section below will give you more details.
3. Precisely locate the target element if it's already shown in the screenshot, put the location info in the \`locate\` field of the action.
4. If some target elements is not shown in the screenshot, consider the user's instruction is not feasible on this page. Follow the next steps.
5. Consider whether the user's instruction will be accomplished after all the actions
Expand Down Expand Up @@ -149,6 +151,31 @@ Each action has a \`type\` and corresponding \`param\`. To be detailed:
* use this action when the conditional statement talked about in the instruction is falsy.
- type: 'Sleep'
* {{ param: {{ timeMs: number }} }}
- type: 'LongPress', trigger a long press on the screen at specified coordinates
* {{ ${llmLocateParam}, param: {{ duration?: number(ms) }} }}
- type: 'Swipe', trigger a swipe gesture from one point to another on the screen
* {{
${llmLocateParam},
param: {{
from?: {{ x: number, y: number }},
to?: {{ x: number, y: number }},
duration?: number(ms),
direction?: 'down' | 'up' | 'right' | 'left',
swipeType?: 'once' | 'untilBottom' | 'untilTop' | 'untilRight' | 'untilLeft',
distance?: number
}}
}}
* To swipe a specific element, put the element at the center of the region in the \`locate\` field.
For a page-level swipe, set \`locate\` to \`null\`.
* If the user specifies an element, use its center as the \`from\` position.
If \`from\` and \`to\` are specified, use them directly.
* If \`to\` cannot be extracted, only extract \`direction\` and \`swipeType\` and \`distance\` for later use.
* Default values:
- \`direction\`: \`'left'\`
- \`swipeType\`: \`'untilLeft'\`
* \`swipeType\` describes swipe behavior:
- 'once'
- 'untilBottom', 'untilTop', 'untilRight', 'untilLeft'
${
pageType === 'android'
? `- type: 'AndroidBackButton', trigger the system "back" operation on Android devices
Expand Down Expand Up @@ -284,7 +311,7 @@ export const planSchema: ResponseFormatJSONSchema = {
type: {
type: 'string',
description:
'Type of action, one of "Tap", "RightClick", "Hover" , "Input", "KeyboardPress", "Scroll", "ExpectedFalsyCondition", "Sleep", "AndroidBackButton", "AndroidHomeButton", "AndroidRecentAppsButton", "AndroidLongPress"',
'Type of action, one of "Tap", "RightClick", "Hover" , "Input", "KeyboardPress", "Scroll", "Swipe", "LongPress", "ExpectedFalsyCondition", "Sleep", "AndroidBackButton", "AndroidHomeButton", "AndroidRecentAppsButton", "AndroidLongPress"',
},
param: {
anyOf: [
Expand All @@ -311,6 +338,47 @@ export const planSchema: ResponseFormatJSONSchema = {
required: ['direction', 'scrollType', 'distance'],
additionalProperties: false,
},
{
type: 'object',
properties: {
duration: { type: ['number', 'string'] },
from: {
anyOf: [
{
type: 'object',
properties: {
x: { type: ['number', 'string'] },
y: { type: ['number', 'string'] },
},
required: ['x', 'y'],
additionalProperties: false,
},
{ type: 'null' }
]
},
to: {
anyOf: [
{
type: 'object',
properties: {
x: { type: ['number', 'string'] },
y: { type: ['number', 'string'] },
},
required: ['x', 'y'],
additionalProperties: false,
},
{ type: 'null' }
]
}
},
additionalProperties: false
},

{
type: 'object',
properties: { duration: { type: ['number', 'string'] } },
additionalProperties: false,
},
{
type: 'object',
properties: { reason: { type: 'string' } },
Expand Down
21 changes: 20 additions & 1 deletion packages/core/src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,8 @@ export interface PlanningAction<ParamType = any> {
| 'Input'
| 'KeyboardPress'
| 'Scroll'
| 'Swipe'
| 'LongPress'
| 'Error'
| 'ExpectedFalsyCondition'
| 'Assert'
Expand Down Expand Up @@ -309,6 +311,23 @@ export interface PlanningActionParamInputOrKeyPress {

export type PlanningActionParamScroll = scrollParam;

export interface PlanningActionParamLongPress {
duration?: number;
}
export interface PlanningActionParamSwipe {
from?: {
x: number;
y: number;
};
to?: {
x: number;
y: number;
};
duration?: number;
direction: 'down' | 'up' | 'right' | 'left';
swipeType: 'once' | 'untilBottom' | 'untilTop' | 'untilRight' | 'untilLeft';
distance?: number;
}
export interface PlanningActionParamAssert {
assertion: TUserPrompt;
}
Expand Down Expand Up @@ -510,7 +529,7 @@ export type ExecutionTaskInsightAssertion =
ExecutionTask<ExecutionTaskInsightAssertionApply>;

/*
task - action (i.e. interact)
task - action (i.e. interact)
*/
export type ExecutionTaskActionApply<ActionParam = any> = ExecutionTaskApply<
'Action',
Expand Down
16 changes: 11 additions & 5 deletions packages/core/src/yaml.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import type { PlanningActionParamScroll, Rect, TUserPrompt } from './types';
import type { PlanningActionParamScroll, Rect, TUserPrompt, PlanningActionParamLongPress, PlanningActionParamSwipe } from './types';
import type { BaseElement, UIContext } from './types';

export interface LocateOption {
Expand Down Expand Up @@ -155,13 +155,17 @@ export interface MidsceneYamlFlowItemAIKeyboardPress extends LocateOption {
locate?: TUserPrompt; // where to press, optional
}

export interface MidsceneYamlFlowItemAIScroll
extends LocateOption,
PlanningActionParamScroll {
export interface MidsceneYamlFlowItemAIScroll extends LocateOption, PlanningActionParamScroll {
aiScroll: null;
locate?: TUserPrompt; // which area to scroll, optional
}

export interface MidsceneYamlFlowItemAILongPress extends LocateOption, PlanningActionParamLongPress {
aiLongPress: TUserPrompt;
}
export interface MidsceneYamlFlowItemAISwipe extends LocateOption, PlanningActionParamSwipe{
aiSwipe: null;
locate?: TUserPrompt; // where to swipe, optional
}
export interface MidsceneYamlFlowItemEvaluateJavaScript {
javascript: string;
name?: string;
Expand All @@ -187,6 +191,8 @@ export type MidsceneYamlFlowItem =
| MidsceneYamlFlowItemAIInput
| MidsceneYamlFlowItemAIKeyboardPress
| MidsceneYamlFlowItemAIScroll
| MidsceneYamlFlowItemAISwipe
| MidsceneYamlFlowItemAILongPress
| MidsceneYamlFlowItemSleep
| MidsceneYamlFlowItemLogScreenshot;

Expand Down
Loading