|
| 1 | +Template-Free Element Proposal (Pixels to Elements) |
| 2 | +=================================================== |
| 3 | + |
| 4 | +Set-of-Marks, ``observation`` and the grounding helpers all assume you already |
| 5 | +have a list of element boxes — but on a screen the framework doesn't model |
| 6 | +(a game, a custom-drawn app, a remote desktop) there is no accessibility tree to |
| 7 | +provide one. ``element_proposal`` builds that top-of-funnel list from pixels: |
| 8 | +detect candidate *widget* boxes (closed-edge blobs) and *text* boxes |
| 9 | +(:func:`text_regions.find_text_regions`), fuse them — dropping widget boxes that |
| 10 | +are really just text — and return them in reading order, each tagged ``text`` or |
| 11 | +``widget``. |
| 12 | + |
| 13 | +* :func:`propose_elements` — the full pixel-to-elements pipeline. |
| 14 | +* :func:`tag_kinds` — pure: label fused boxes ``text`` / ``widget`` by source and |
| 15 | + keep their reading-order ``index``. |
| 16 | + |
| 17 | +The fusion / cross-check / ordering reuse :mod:`element_parse` — the ``ocr`` > |
| 18 | +``icon`` source priority *is* the "drop widget-that-is-really-text" check — and |
| 19 | +the text detection reuses :mod:`text_regions`. ``cv2`` is imported lazily so the |
| 20 | +module stays importable; :func:`tag_kinds` is pure and fully testable. Imports no |
| 21 | +``PySide6``. |
| 22 | + |
| 23 | +Headless API |
| 24 | +------------ |
| 25 | + |
| 26 | +.. code-block:: python |
| 27 | +
|
| 28 | + from je_auto_control import propose_elements, mark_elements |
| 29 | +
|
| 30 | + # No accessibility tree? Propose elements straight from the screen: |
| 31 | + elements = propose_elements(min_area=120) |
| 32 | + # [{'box': [x, y, w, h], 'kind': 'widget', 'index': 0}, ...] |
| 33 | +
|
| 34 | + # Feed them to Set-of-Marks like any other element list: |
| 35 | + marks = mark_elements(elements) |
| 36 | +
|
| 37 | +``propose_elements`` returns ``[{box, kind, index}]`` in reading order, where |
| 38 | +``kind`` is ``text`` or ``widget``. It is the missing top-of-funnel for the |
| 39 | +agent stack on un-modelled UIs: pixels in, a clean numbered element list out, |
| 40 | +ready for marking, observation or grounding. Tune ``min_area`` for the smallest |
| 41 | +control you care about and ``iou_threshold`` for how aggressively overlapping |
| 42 | +text and widget boxes are merged. |
| 43 | + |
| 44 | +Executor commands |
| 45 | +----------------- |
| 46 | + |
| 47 | +``AC_propose_elements`` (``region`` ``[x, y, w, h]`` / ``min_area`` / |
| 48 | +``iou_threshold`` → ``{elements}``) runs the full pipeline on the screen, and |
| 49 | +``AC_tag_kinds`` (``elements`` JSON list → ``{elements}``, pure) labels a |
| 50 | +pre-fused list. They are the matching read-only ``ac_*`` MCP tools and Script |
| 51 | +Builder commands under **Image**. |
0 commit comments