Skip to content

Commit dea0f83

Browse files
authored
Merge pull request #450 from Integration-Automation/dev
Release: perception lane part 2 (change localize, icon classify, element proposal) v218-v220
2 parents 2ce7a6d + bb1d31e commit dea0f83

21 files changed

Lines changed: 1080 additions & 0 deletions

File tree

WHATS_NEW.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,24 @@
22

33
## What's new (2026-06-26)
44

5+
### Template-Free Element Proposal (Pixels to Elements)
6+
7+
Get a clean numbered element list straight from the screen when there's no accessibility tree. Full reference: [`docs/source/Eng/doc/new_features/v220_features_doc.rst`](docs/source/Eng/doc/new_features/v220_features_doc.rst).
8+
9+
- **`propose_elements` / `tag_kinds`** (`AC_propose_elements`, `AC_tag_kinds`): Set-of-Marks, `observation` and the grounding helpers all assume you already have element boxes — but a game, a custom-drawn app or a remote desktop has no accessibility tree. `propose_elements` builds that top-of-funnel list from pixels: detect widget boxes (closed-edge blobs via Canny + morphology + `connected_boxes`) and text boxes (`text_regions.find_text_regions`), fuse them — the `element_parse` `ocr > icon` priority *is* the "drop widget-that-is-really-text" cross-check — and return them in reading order, each tagged `text` or `widget`. `tag_kinds` is the pure labeller. cv2 imported lazily; the labeller is fully testable. Seventh and final feature of the ROUND-15 perception lane. No `PySide6`.
10+
11+
### Classify a Widget from Its Pixel Shape
12+
13+
Tell a checkbox from a radio button from a text field — from pixels, no model. Full reference: [`docs/source/Eng/doc/new_features/v219_features_doc.rst`](docs/source/Eng/doc/new_features/v219_features_doc.rst).
14+
15+
- **`classify_widget` / `box_features` / `classify_icon`** (`AC_classify_widget`, `AC_classify_icon`): Set-of-Marks and element proposers return *boxes* but not *what each box is*; `form_fields.checkbox_state` reads a box already known to be a checkbox — the gap is the typing step before it. `box_features` extracts `{aspect, fill, edge_density, circularity}` for a box; `classify_widget` is the pure heuristic classifier (round→radio, wide-rounded→toggle, square-sparse→checkbox, wide-hollow→text_field, wide-filled→button, else icon); `classify_icon` composes them. The classifier is pure and fully testable; cv2/numpy imported lazily so the module stays importable. Sixth feature of the ROUND-15 perception lane. No `PySide6`.
16+
17+
### Localize a Change to the Elements That Changed
18+
19+
Turn a raw screen diff into "element 3 changed" by scoring a list of element boxes. Full reference: [`docs/source/Eng/doc/new_features/v218_features_doc.rst`](docs/source/Eng/doc/new_features/v218_features_doc.rst).
20+
21+
- **`localize_changes` / `rank_changes`** (`AC_localize_changes`, `AC_rank_changes`): existing diffs answer *where* pixels changed (`motion_regions`, `perceptual_diff`, `ssim_changed_regions` → raw pixel regions) or which *accessibility* elements differ (`element_diff`, needs metadata) — but not "given a frame diff **and a list of element boxes**, which of *those* changed?". `localize_changes` diffs a reference against the current screen and scores each supplied element box by its mean per-pixel change; `rank_changes` is the pure ranker that flags `changed` (score ≥ `threshold`) and sorts most-changed first. Pairs with `set_of_marks`/accessibility boxes to give a per-element "what changed" feedback signal after a click. cv2/numpy imported lazily; ranking is pure and fully testable. Fifth feature of the ROUND-15 perception lane. No `PySide6`.
22+
523
### Theme-Invariant Matching (Light Template, Dark Mode)
624

725
Find a button captured in light mode even after the app switches to dark mode. Full reference: [`docs/source/Eng/doc/new_features/v217_features_doc.rst`](docs/source/Eng/doc/new_features/v217_features_doc.rst).
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
Localize a Change to the Elements That Changed
2+
==============================================
3+
4+
The existing diffs answer "*where* did pixels change" (``motion_regions``,
5+
``perceptual_diff``, ``ssim_changed_regions`` return raw pixel regions) or "which
6+
*accessibility* elements differ" (``element_diff``, needs a11y metadata). The
7+
missing middle is: given a frame diff **and a list of element boxes**, which of
8+
*those* elements changed? ``change_localize`` scores each supplied box by how
9+
much it changed and ranks them.
10+
11+
* :func:`rank_changes` — pure: take ``[{box, score}]`` and mark each box
12+
``changed`` (score at or above ``threshold``), sorted most-changed first.
13+
* :func:`localize_changes` — diff a reference against the current screen, score
14+
each element box by its mean pixel change, and rank them.
15+
16+
``cv2`` / ``numpy`` are imported lazily (the module stays importable without
17+
them) and the loaders reuse :mod:`visual_match`. The ranking is pure and fully
18+
testable. Imports no ``PySide6``.
19+
20+
Headless API
21+
------------
22+
23+
.. code-block:: python
24+
25+
from je_auto_control import localize_changes, rank_changes, mark_elements
26+
27+
boxes = [mark["bbox"] for mark in mark_elements(elements)]
28+
29+
# After an action, which of those elements actually changed?
30+
changed = localize_changes("before.png", boxes, current="after.png")
31+
for entry in changed:
32+
if entry["changed"]:
33+
print("element changed:", entry["box"], entry["score"])
34+
35+
# Or rank pre-computed scores yourself:
36+
rank_changes([{"box": [0, 0, 40, 20], "score": 0.6}], threshold=0.1)
37+
38+
``localize_changes`` returns ``[{box, score, changed}]`` sorted most-changed
39+
first, where ``score`` is the box's mean per-pixel change (0..1). It pairs with
40+
``set_of_marks`` / accessibility element boxes to turn a raw screen diff into a
41+
per-element "what changed" signal — an agent feedback channel after a click.
42+
43+
Executor commands
44+
-----------------
45+
46+
``AC_localize_changes`` (``reference`` + ``boxes`` JSON list + ``current`` /
47+
``threshold`` / ``region`` → ``{changes}``) and ``AC_rank_changes``
48+
(``scored_boxes`` JSON list + ``threshold`` → ``{changes}``, pure). They are the
49+
matching read-only ``ac_*`` MCP tools and Script Builder commands under
50+
**Image**.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
Classify a Widget from Its Pixel Shape
2+
======================================
3+
4+
Set-of-Marks and element proposers hand back *boxes*, but not *what each box is*.
5+
``form_fields.checkbox_state`` already reads a box known to be a checkbox; the
6+
gap is the typing step before it — is this box a checkbox, a radio button, a push
7+
button, a text field or a toggle? ``icon_classify`` answers that from cheap
8+
geometric features (no model).
9+
10+
* :func:`box_features` — extract ``{aspect, fill, edge_density, circularity}``
11+
for a box region (the objective measurements).
12+
* :func:`classify_widget` — pure: map a feature dict to a widget type by
13+
documented heuristics.
14+
* :func:`classify_icon` — compose the two: a box to ``{type, features}``.
15+
16+
``classify_widget`` is pure and fully testable; ``box_features`` imports cv2 /
17+
numpy lazily (the module stays importable without them) and reuses
18+
:func:`visual_match._to_gray`. Imports no ``PySide6``.
19+
20+
Headless API
21+
------------
22+
23+
.. code-block:: python
24+
25+
from je_auto_control import classify_icon, classify_widget
26+
27+
# From a screenshot + a box:
28+
classify_icon("dialog.png", [120, 80, 16, 16])
29+
# {'type': 'checkbox', 'features': {'aspect': 1.0, 'fill': 0.12, ...}}
30+
31+
# From features you already have:
32+
classify_widget({"aspect": 1.0, "circularity": 0.9, "fill": 0.4}) # 'radio'
33+
34+
The heuristics: a round box (aspect ≈ 1, high circularity) is a ``radio``; a wide
35+
rounded box is a ``toggle``; a near-square sparse box is a ``checkbox``; a wide
36+
hollow box is a ``text_field``; a wide filled box is a ``button``; anything else
37+
is an ``icon``. Tune by reading ``features`` and applying your own rules where
38+
the defaults misfire — the measurements are the durable part.
39+
40+
Executor commands
41+
-----------------
42+
43+
``AC_classify_widget`` (``features`` JSON object → ``{type}``, pure) and
44+
``AC_classify_icon`` (``source`` image + ``box`` ``[x, y, w, h]`` →
45+
``{type, features}``). They are the matching read-only ``ac_*`` MCP tools and
46+
Script Builder commands under **Image**.
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
Template-Free Element Proposal (Pixels to Elements)
2+
===================================================
3+
4+
Set-of-Marks, ``observation`` and the grounding helpers all assume you already
5+
have a list of element boxes — but on a screen the framework doesn't model
6+
(a game, a custom-drawn app, a remote desktop) there is no accessibility tree to
7+
provide one. ``element_proposal`` builds that top-of-funnel list from pixels:
8+
detect candidate *widget* boxes (closed-edge blobs) and *text* boxes
9+
(:func:`text_regions.find_text_regions`), fuse them — dropping widget boxes that
10+
are really just text — and return them in reading order, each tagged ``text`` or
11+
``widget``.
12+
13+
* :func:`propose_elements` — the full pixel-to-elements pipeline.
14+
* :func:`tag_kinds` — pure: label fused boxes ``text`` / ``widget`` by source and
15+
keep their reading-order ``index``.
16+
17+
The fusion / cross-check / ordering reuse :mod:`element_parse` — the ``ocr`` >
18+
``icon`` source priority *is* the "drop widget-that-is-really-text" check — and
19+
the text detection reuses :mod:`text_regions`. ``cv2`` is imported lazily so the
20+
module stays importable; :func:`tag_kinds` is pure and fully testable. Imports no
21+
``PySide6``.
22+
23+
Headless API
24+
------------
25+
26+
.. code-block:: python
27+
28+
from je_auto_control import propose_elements, mark_elements
29+
30+
# No accessibility tree? Propose elements straight from the screen:
31+
elements = propose_elements(min_area=120)
32+
# [{'box': [x, y, w, h], 'kind': 'widget', 'index': 0}, ...]
33+
34+
# Feed them to Set-of-Marks like any other element list:
35+
marks = mark_elements(elements)
36+
37+
``propose_elements`` returns ``[{box, kind, index}]`` in reading order, where
38+
``kind`` is ``text`` or ``widget``. It is the missing top-of-funnel for the
39+
agent stack on un-modelled UIs: pixels in, a clean numbered element list out,
40+
ready for marking, observation or grounding. Tune ``min_area`` for the smallest
41+
control you care about and ``iou_threshold`` for how aggressively overlapping
42+
text and widget boxes are merged.
43+
44+
Executor commands
45+
-----------------
46+
47+
``AC_propose_elements`` (``region`` ``[x, y, w, h]`` / ``min_area`` /
48+
``iou_threshold`` → ``{elements}``) runs the full pipeline on the screen, and
49+
``AC_tag_kinds`` (``elements`` JSON list → ``{elements}``, pure) labels a
50+
pre-fused list. They are the matching read-only ``ac_*`` MCP tools and Script
51+
Builder commands under **Image**.
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
把變化歸因到實際改變的元素
2+
==========================
3+
4+
既有的 diff 回答「像素在*哪裡*改變」(``motion_regions``、``perceptual_diff``、
5+
``ssim_changed_regions`` 回傳原始像素區域),或「哪些*無障礙*元素不同」(``element_diff``,需 a11y 中介資料)。
6+
缺少的中段是:給定一個畫面 diff **與一份元素方框清單**,*那些*元素中哪些改變了?``change_localize`` 依
7+
每個提供的方框改變多少評分並排序。
8+
9+
* :func:`rank_changes` ——純函式:接受 ``[{box, score}]`` 並把每個方框標記為 ``changed``
10+
(分數達到或超過 ``threshold``),依改變最多排在最前。
11+
* :func:`localize_changes` ——把參考影像對目前螢幕做 diff,依每個元素方框的平均像素改變評分,再排序。
12+
13+
``cv2`` / ``numpy`` 採延遲匯入(模組無需它們即可匯入),載入器重用 :mod:`visual_match`。
14+
排序為純函式且可完整測試。不匯入 ``PySide6``。
15+
16+
無頭 API
17+
--------
18+
19+
.. code-block:: python
20+
21+
from je_auto_control import localize_changes, rank_changes, mark_elements
22+
23+
boxes = [mark["bbox"] for mark in mark_elements(elements)]
24+
25+
# 某動作後,那些元素中哪些真的改變了?
26+
changed = localize_changes("before.png", boxes, current="after.png")
27+
for entry in changed:
28+
if entry["changed"]:
29+
print("元素改變:", entry["box"], entry["score"])
30+
31+
# 或自行排序預先算好的分數:
32+
rank_changes([{"box": [0, 0, 40, 20], "score": 0.6}], threshold=0.1)
33+
34+
``localize_changes`` 回傳 ``[{box, score, changed}]`` 依改變最多排序,``score`` 是方框的平均
35+
逐像素改變(0..1)。它與 ``set_of_marks`` / 無障礙元素方框搭配,把原始螢幕 diff 轉成逐元素的
36+
「什麼改變了」訊號——點擊後的 agent 回饋通道。
37+
38+
執行器指令
39+
----------
40+
41+
``AC_localize_changes``(``reference`` 加上 ``boxes`` JSON 清單加上 ``current`` /
42+
``threshold`` / ``region`` → ``{changes}``)與 ``AC_rank_changes``(``scored_boxes`` JSON 清單加上
43+
``threshold`` → ``{changes}``,純函式)。皆以對應的唯讀 ``ac_*`` MCP 工具及 Script Builder 指令
44+
(位於 **Image** 分類下)形式提供。
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
從像素形狀分類控制項
2+
====================
3+
4+
Set-of-Marks 與元素提案器回傳*方框*,卻不告訴你*每個方框是什麼*。``form_fields.checkbox_state``
5+
已能讀取一個已知是核取方塊的方框;缺少的是它之前的分類步驟——這個方框是核取方塊、單選鈕、按鈕、
6+
文字欄位還是切換開關?``icon_classify`` 從低成本的幾何特徵(無需模型)回答此問題。
7+
8+
* :func:`box_features` ——擷取方框區域的 ``{aspect, fill, edge_density, circularity}``(客觀量測)。
9+
* :func:`classify_widget` ——純函式:以記載的啟發式規則把特徵字典映射為控制項型別。
10+
* :func:`classify_icon` ——組合兩者:把一個方框轉為 ``{type, features}``。
11+
12+
``classify_widget`` 為純函式且可完整測試;``box_features`` 延遲匯入 cv2 / numpy(模組無需它們即可匯入),
13+
並重用 :func:`visual_match._to_gray`。不匯入 ``PySide6``。
14+
15+
無頭 API
16+
--------
17+
18+
.. code-block:: python
19+
20+
from je_auto_control import classify_icon, classify_widget
21+
22+
# 從截圖 + 方框:
23+
classify_icon("dialog.png", [120, 80, 16, 16])
24+
# {'type': 'checkbox', 'features': {'aspect': 1.0, 'fill': 0.12, ...}}
25+
26+
# 從你已有的特徵:
27+
classify_widget({"aspect": 1.0, "circularity": 0.9, "fill": 0.4}) # 'radio'
28+
29+
啟發式規則:圓形方框(aspect ≈ 1、高 circularity)為 ``radio``;寬且圓潤為 ``toggle``;
30+
近正方且稀疏為 ``checkbox``;寬且空心為 ``text_field``;寬且填滿為 ``button``;其餘為 ``icon``。
31+
在預設誤判處,可讀取 ``features`` 套用你自己的規則微調——量測值才是耐用的部分。
32+
33+
執行器指令
34+
----------
35+
36+
``AC_classify_widget``(``features`` JSON 物件 → ``{type}``,純函式)與
37+
``AC_classify_icon``(``source`` 影像 + ``box`` ``[x, y, w, h]`` → ``{type, features}``)。
38+
皆以對應的唯讀 ``ac_*`` MCP 工具及 Script Builder 指令(位於 **Image** 分類下)形式提供。
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
免模板元素提案(像素到元素)
2+
============================
3+
4+
Set-of-Marks、``observation`` 與 grounding 輔助函式都假設你已有一份元素方框清單——但在框架無法
5+
建模的畫面上(遊戲、自繪 app、遠端桌面),並沒有無障礙樹可提供。``element_proposal`` 從像素建立
6+
這份漏斗頂端清單:偵測候選*控制項*方框(封閉邊緣 blob)與*文字*方框
7+
(:func:`text_regions.find_text_regions`),將兩者融合——丟棄其實只是文字的控制項方框——
8+
並依閱讀順序回傳,每個標記為 ``text`` 或 ``widget``。
9+
10+
* :func:`propose_elements` ——完整的像素到元素管線。
11+
* :func:`tag_kinds` ——純函式:依來源把融合後的方框標記 ``text`` / ``widget``,並保留其閱讀順序 ``index``。
12+
13+
融合 / 交叉檢查 / 排序重用 :mod:`element_parse`——``ocr`` > ``icon`` 來源優先序*即*「丟棄其實是
14+
文字的控制項」檢查——文字偵測則重用 :mod:`text_regions`。``cv2`` 採延遲匯入,故模組仍可匯入;
15+
:func:`tag_kinds` 為純函式且可完整測試。不匯入 ``PySide6``。
16+
17+
無頭 API
18+
--------
19+
20+
.. code-block:: python
21+
22+
from je_auto_control import propose_elements, mark_elements
23+
24+
# 沒有無障礙樹?直接從畫面提案元素:
25+
elements = propose_elements(min_area=120)
26+
# [{'box': [x, y, w, h], 'kind': 'widget', 'index': 0}, ...]
27+
28+
# 像任何元素清單一樣餵給 Set-of-Marks:
29+
marks = mark_elements(elements)
30+
31+
``propose_elements`` 依閱讀順序回傳 ``[{box, kind, index}]``,``kind`` 為 ``text`` 或 ``widget``。
32+
它是 agent 堆疊在未建模 UI 上缺少的漏斗頂端:像素進、乾淨的編號元素清單出,可供標記、observation
33+
或 grounding。以 ``min_area`` 調整你在意的最小控制項,以 ``iou_threshold`` 調整重疊文字與控制項
34+
方框合併的積極程度。
35+
36+
執行器指令
37+
----------
38+
39+
``AC_propose_elements``(``region`` ``[x, y, w, h]`` / ``min_area`` /
40+
``iou_threshold`` → ``{elements}``)在畫面上執行完整管線,``AC_tag_kinds``
41+
(``elements`` JSON 清單 → ``{elements}``,純函式)則標記預先融合的清單。皆以對應的唯讀
42+
``ac_*`` MCP 工具及 Script Builder 指令(位於 **Image** 分類下)形式提供。

je_auto_control/__init__.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,14 @@
143143
)
144144
# Theme-invariant matching so a light template matches dark mode
145145
from je_auto_control.utils.theme_normalize import match_theme, normalize_theme
146+
# Attribute a screen change to the specific element boxes that changed
147+
from je_auto_control.utils.change_localize import localize_changes, rank_changes
148+
# Classify what kind of widget a box is from its pixel shape
149+
from je_auto_control.utils.icon_classify import (
150+
box_features, classify_icon, classify_widget,
151+
)
152+
# Propose a clean element list from raw pixels (template-free)
153+
from je_auto_control.utils.element_proposal import propose_elements, tag_kinds
146154
# Rich clipboard formats — RTF + CSV/TSV codecs and Windows get / set
147155
from je_auto_control.utils.clipboard_rich_formats import (
148156
build_rtf, csv_to_rows, get_clipboard_csv, get_clipboard_rtf, rows_to_csv,
@@ -1771,6 +1779,9 @@ def start_autocontrol_gui(*args, **kwargs):
17711779
"place_labels", "label_color",
17721780
"grade_contrast", "dominant_pair", "region_contrast",
17731781
"normalize_theme", "match_theme",
1782+
"localize_changes", "rank_changes",
1783+
"classify_widget", "box_features", "classify_icon",
1784+
"propose_elements", "tag_kinds",
17741785
"build_rtf", "rtf_to_text", "rows_to_csv", "csv_to_rows",
17751786
"set_clipboard_rtf", "get_clipboard_rtf",
17761787
"set_clipboard_csv", "get_clipboard_csv",

0 commit comments

Comments
 (0)