Skip to content

Latest commit

 

History

History
645 lines (549 loc) · 35.9 KB

File metadata and controls

645 lines (549 loc) · 35.9 KB

Image Detection Bot

Automate mouse/keyboard actions by detecting on-screen images. This repo ships a PyQt6 GUI to manage templates, create sequences of actions, and configure a visual failsafe that can interrupt or be tested on demand. Built for personal automation, UI testing, and tinkering.

Features

  • GUI Tabs: Sequences, Failsafe, Templates, Template Tester, and Recorder
  • Template Management: Create, preview, capture from screen, or load from file
  • Sequence Editor: Add steps that find a template and then run actions
  • Action Types: click, right_click, double_click, move, move_to, type, key_press, wait, scroll, click_and_hold, play_recording
  • Regions & Randomization:
    • Step-level search_region for finding templates
    • Click actions can target a random point in a selected region
    • Move-To actions can use a selected region with optional random movement
  • Failsafe System: Enable a template-based trigger, define a separate sequence, and test it with a button
  • Template Tester: Live preview of template matching with confidence meter and optional search region
  • Overlay Previews: One-click visual overlays to preview current search regions and click points (Show Region, Show Click buttons across editors).
  • Recorder Tab: Record global mouse/keyboard input with timing, delete events, save named recordings to recordings/, preview overlays without executing, and play recordings as actions in sequences/groups/failsafe.
  • Hotkeys & Status: F8 to stop; status bar updates; mouse position tracker
  • Config Persistence: Reads/writes config.json and keeps template paths relative when possible
  • Break Settings: Set a maximum runtime cap (hours/minutes/seconds); live timer shows Elapsed / Max; cap overrides loop settings

Architecture

  • Desktop GUI (PyQt6): author templates, sequences, groups, failsafe; run and observe step progress.
  • Web Server: lightweight HTTP server with editors and a live MJPEG preview; mirrors most GUI features.
  • IPC Bridge: the web server writes commands to ipc_command.json; the GUI polls and acts (run/stop/reload/etc.).
  • Launcher (Python/Tk + PowerShell): unified start/stop/open; can edit and save server_config.json before launch.
  • Configuration: config.json stores templates, sequences, failsafe, groups, schedules, break settings.
  • Assets: images/, failsafe_images/, web/static/.

Recent Updates (Web + GUI)

  • Overlay previews in the GUI

    • Added a transparent, non-interactive overlay window used to preview coordinates.
    • New buttons: Show Region in Step Editor, Action Editor, and Failsafe sidebar; Show Click in Action Editor.
    • Previews draw a red rectangle for regions and a crosshair marker for click points; overlays auto-close after a short timeout.
  • Action editor layout polish (GUI)

    • Compact action rows with a two-line layout; conditions moved below main params.
    • Dropdowns auto-size to content; inputs standardized to 24px height for readability.
    • Advanced-only fields moved behind the Advanced… dialog to save horizontal space.
    • Duplicate controls removed from inline rows when already available under Advanced….
  • Template reload robustness and debug toggle fix

    • Template list is repopulated in the UI first; each template loads into the bot with per-item error handling. UI no longer clears if a single template fails.
    • Fixed a NameError when toggling branch debug by reading templates from self.config within the handler; the toggle no longer disrupts UI refresh.
  • Configuration safety and startup stability

    • Guarded auto‑save during startup with a suppression flag to prevent writing an empty config.json while UI constructs.
    • Wrapped early calls like toggle_failsafe_ui so they don’t override the saved configuration before load completes.
    • Prevented template tab auto‑switch during config reloads; GUI now defaults to the Sequences tab at startup and after reload.
  • Sequence editor robustness

    • Rebuilds a fresh SequenceEditor whenever selection changes to avoid stale state and disappearing steps.
    • Added a _rebuilding_sequence_editor guard so config mirrors don’t write empty steps during transitions.
    • Preserves existing steps if an editor snapshot returns empty.
    • Added SequenceEditor.update_groups(...) so Group Call dropdowns stay in sync with current group names.
  • Groups management

    • Fixed a bug where deleted groups reappeared: prevented update_config_from_ui from writing stale group_editor_widget contents back if the group was removed; cleared editor references after deletion.
  • Failsafe improvements

    • Web → GUI template sync: desktop now reads both failsafe.template_name and failsafe.template and preserves selection during combo refresh. Server writes both keys for full symmetry.
    • Web editor now supports adding normal steps (not only Group Calls) and “Use Preview as Failsafe Region”.
    • Action editor mirrors Sequences behavior: shows only relevant fields per action type (wait → seconds, move_to/drag → X/Y/duration/random/region, keypress → key/modifiers, scroll → pixels, click → button/clicks).
    • Random movement: web Failsafe editor saves random and random_region so GUI runs move‑to in random mode correctly.
  • Web UX and auth

    • Added top‑nav links for a consistent flow between Dashboard, Sequences, Groups, Controls, Schedules, Templates, Failsafe.
    • “Remember token” checkbox added to all pages (Dashboard/Controls/Sequences/Groups/Templates/Failsafe/Schedules); tokens sync via localStorage and auto‑populate across pages.
    • Sequences editor: added a “Group” dropdown next to “Add Group Call”; loads groups before rendering; always displays the selected group even if not yet in the global list.
  • Defaults and quality of life

    • New steps (GUI and Web) carry sensible defaults: confidence, detection_strategy, step_loops, monitor, and a default actions list.
    • Injects a global search_region into new steps (when present) for faster authoring.
    • Group selectors default to the first available group for quick adds.
  • Templates (Web)

    • Added Live Screen Preview to Templates with Monitor and Size controls, plus a “Using Phone” toggle to enable touch selection and disable scroll while dragging.
    • Click‑drag (or touch‑drag) a selection box over the preview; coordinates are mapped to natural pixels regardless of preview size or zoom.
    • New “Save Selection as Template” workflow: captures the selected region from the chosen monitor and saves it to images/<name>.png, then auto‑registers it in config.json and shows the preview.
    • Backed by GET /api/monitors and GET /stream.mjpeg; capture endpoint POST /api/templates/capture stores the image and updates templates.

Requirements

  • Python 3.8+ (Windows batch launcher checks 3.8+)
  • opencv-python, pyautogui, numpy, Pillow, PyQt6, pyqt6-tools, darkdetect, mss, psutil, pynput
  • Notes:
    • pynput enables global mouse/keyboard recording for the Recorder tab.
    • If you don’t need recording, you can skip pynput.
  • Install via requirements.txt

Setup

  • Install dependencies:
    pip install -r requirements.txt
    
  • Configuration files:
    • config.json — main app configuration (created/updated by GUI/web server)
    • server_config.json — web server settings (bind, port, token, mjpeg_fps, backup_retention)
    • Backups — stored in backup configs/ with retention (backup_retention default 20)

Installation

  • Clone or download this repository
  • Install dependencies:
    pip install -r requirements.txt
    
  • Launch the GUI:
    • Cross-platform:
      python bot_gui.py
      
    • Windows convenience launcher:
      launch_gui.bat
      

Running the Web Server

  • Start the server (requires the same Python environment):
    python web_server.py
    
  • Open http://localhost:8765/ in your browser.
  • Auth: the server reads server_config.json for token; pages include a “Remember token” checkbox that persists your token across all editors via localStorage.
  • Live preview and region capture require the mss package. If the stream or capture returns blank, ensure mss is installed and accessible in your Python environment.

Python Launcher

  • Start the interactive launcher:
    python launcher.py
    
  • Choose to launch the GUI, the Web Server, or both.
  • If launching the Web Server, you can either reuse the last server_config.json or edit settings (bind, port, token, mjpeg_fps, backup_retention) in the launcher and save them.
  • On Windows, you can alternatively use launch.bat (PowerShell wrapper) for a menu-driven experience.

Launcher Details

  • launcher.py (Tkinter GUI)
    • Options: Launch GUI, Launch Web Server, or Both
    • “Use last settings” or “Edit and save new settings” for the web server
    • Buttons: Start, Open Web Portal (opens http://localhost:<port>/?token=...), Stop GUI, Stop Web Server, Exit
    • Compiled behavior: launcher runs without a console; GUI (ImageDetectionBot.exe) and server (WebServer.exe) run with console windows for logs.
  • Batch wrappers
    • run_launcher.bat: starts launcher.py (prefers venv\Scripts\python.exe, falls back to python)
    • launch.bat: invokes launch.ps1 (menu-driven console)

Building (PyInstaller)

You can build the whole project in one go using the provided spec:

pip install pyinstaller
pyinstaller ProjectBundle.spec

This creates a dist/ProjectBundle/ folder containing:

  • Launcher.exe — Tkinter launcher to start GUI and/or Web Server
  • ImageDetectionBot.exe — the desktop GUI
  • WebServer.exe — the HTTP server for the web editors

Static assets (web pages, images, failsafe images) and configs (server_config.json, config.json) are included in the bundle.

Notes:

  • If PyInstaller misses dependencies for your environment, add them to hiddenimports in ProjectBundle.spec.
  • For one‑file (--onefile) builds, prefer building each binary separately; multi‑exe onefile is not supported.

Single Binary with Console Logs (GUI)

If you want a single executable that shows the GUI and keeps a visible console for debug logs:

pyinstaller --onefile SingleBotConsole.spec

This outputs dist/ImageDetectionBotConsole.exe, which launches the GUI and shows a console window with runtime logs. Assets and configs are included. Use this when you prefer seeing logs directly without opening the log file.

Quick Start

  • Templates tab:
    • Click Add Template, set a name and image path
    • Use “Save Selection as Template” on the web Templates page: choose a monitor, drag to select a region on the live preview, enter a name, then save.
    • Alternatively, use GUI capture or Load Image to pick a file
    • Preview auto-updates
  • Sequences tab:
    • Add a sequence, then add steps
    • For each step: set find template, required, timeout, optional confidence
    • Add actions like Click, Move, Move-To, Type, Key Press, Wait, Scroll, Play Recording
    • For Click: optionally select a region to click randomly inside
    • For Move-To: optionally enable random and select a region to move within
    • Use Show Region or Show Click to quickly visualize the current settings before running.
    • Run the selected sequence; press F8 to stop
  • Template Tester tab:
    • Pick a template, optionally select region, start live preview to see confidence value
  • Failsafe tab:
    • Enable failsafe, choose a template and confidence, optionally set a search region
    • Build a separate “failsafe sequence” of steps
    • Use Show Region to preview the configured failsafe region overlay.
    • Click Test Failsafe to run only the failsafe sequence

Screenshots

Screenshots illustrating the GUI and Web editors are stored under docs/screenshots/.

Suggested filenames (drop your PNGs in that folder):

  • GUI
    • gui_sequences.png: Sequences tab with steps and actions
    • gui_failsafe.png: Failsafe tab with settings and sequence
    • gui_templates.png: Templates tab with preview and actions
  • Web
    • web_dashboard.png: Dashboard page with preview and nav
    • web_sequences.png: Sequences editor (steps + actions + preview)
    • web_failsafe.png: Failsafe editor (settings + steps + preview)
    • web_groups.png: Groups editor (list + steps + nested actions)
    • web_schedules.png: Schedules page (rows with Enabled/Sequence/Time)

Screenshot links (not embedded):

Web UI (Editors & Controls)

The project includes a lightweight web server with browser-based editors that mirror most GUI features. Open pages via http://localhost:8765/static/... and append ?token=YOURTOKEN if auth is enabled.

  • Pages

    • Sequences (/static/sequences.html): sequence list, per-step editor, live preview
    • Failsafe (/static/failsafe.html): failsafe settings and sequence editor, live preview
    • Groups (/static/groups.html): group management (sequence-like collections), nested actions, live preview
    • Controls (/static/control.html): run/stop, non-required-wait toggle, “Run Group”
    • Templates (/static/templates.html): templates list, path editor, live screen preview with region selection and “Save Selection as Template”
  • Fully implemented in web editors

    • Per-step header controls (Sequences): find template, required, confidence, timeout, monitor, Step Loops, Detection Strategy, Min Inliers, Ratio, RANSAC, Select Search Region
  • Per-action controls (Sequences/Failsafe): type, button, clicks, x/y, duration, random, seconds, pixels, key, modifiers, Select Region, Set Random Region

    • “Add Action” palette with sensible defaults (click, right/double click, move/move_to/drag, type, wait, scroll, click_and_hold)
    • Group Call steps: dropdown picker and save in Sequences and Failsafe; adds { call_group: "GroupName" } step
    • Groups editor: CRUD for groups; per-step header (find, required, timeout, save, reorder/delete); nested actions with same controls as sequences/failsafe; “Add Action” palette
    • Live Preview on all editors (Sequences/Failsafe/Groups) with monitor selection; click‑drag selection draws a box and maps to natural image coordinates
    • Live Preview on Templates with monitor selection; click‑drag selection and “Save Selection as Template” to capture and register a new template
    • Preview “Size” dropdown (640/800/1024/1280) on all editors; region mapping remains accurate regardless of browser zoom or selected size
  • Partially implemented / known gaps

    • Insert‑at‑index for new steps is supported in Sequences via “Add Step Here” (internally appends then reorders)
    • Label polish and defaults can be tuned based on your workflow
  • Groups editor now exposes advanced step fields

    • Per-step monitor selection with “Use Global” and specific monitor indexes
    • Step Loops configuration
    • Detection Strategy: default (template) or feature
    • Feature matcher params: Min Inliers, Ratio, RANSAC
  • Web API endpoints (selected)

    • Sequences
      • GET /api/sequences → list names
      • GET /api/sequences/:name → sequence details
      • PUT /api/sequences/:name → update metadata (loop, loop_count, rename)
      • POST /api/sequences/:name/steps → append step
      • PUT /api/sequences/:name/steps/:idx → update step
      • DELETE /api/sequences/:name/steps/:idx → delete step
      • POST /api/sequences/:name/steps/reorder → move a step
      • POST /api/sequences/:name/steps/:idx/actions → append action
      • PUT /api/sequences/:name/steps/:idx/actions/:aidx → update action
      • DELETE /api/sequences/:name/steps/:idx/actions/:aidx → delete action
    • Templates
      • GET /api/templates → list names, paths, existence
      • GET /api/templates/:name → template detail (path, exists)
      • PUT /api/templates/:name → update path for a template
      • DELETE /api/templates/:name → delete template mapping
      • GET /api/template-image?name=<name> → serve the template image
      • POST /api/templates → add a template mapping { name, path }
      • POST /api/templates/capture → capture a selected screen region and save to images/<name>.png (or provided path), then update the templates map
    • Preview & Monitors
      • GET /api/monitors → list monitor bounds and indexes
      • GET /stream.mjpeg?monitor=<index> → MJPEG stream of selected monitor (PNG frames)
      • POST /api/sequences/:name/steps/:idx/actions/reorder → move an action
    • Failsafe
      • GET /api/failsafe → settings
      • PUT /api/failsafe → update settings
      • GET /api/failsafe/sequence → list steps
      • POST /api/failsafe/sequence → append step
      • PUT /api/failsafe/sequence/:idx → update step
      • DELETE /api/failsafe/sequence/:idx → delete step
      • POST /api/failsafe/sequence/reorder → move step
      • POST /api/failsafe/sequence/:idx/actions → append action
      • PUT /api/failsafe/sequence/:idx/actions/:aidx → update action
      • DELETE /api/failsafe/sequence/:idx/actions/:aidx → delete action
      • POST /api/failsafe/sequence/:idx/actions/reorder → move action
    • Groups
      • GET /api/groups → list group names
      • GET /api/groups/:name → group details (supports dict‑ and list‑based storage)
      • POST /api/groups → create
      • PUT /api/groups/:name → update (rename, steps, loop, loop_count)
      • DELETE /api/groups/:name → delete
      • POST /api/groups/:name/steps → append step
      • PUT /api/groups/:name/steps/:idx → update step
      • DELETE /api/groups/:name/steps/:idx → delete step
      • POST /api/groups/:name/steps/reorder → move step
      • POST /api/groups/:name/steps/:idx/actions → append action
      • PUT /api/groups/:name/steps/:idx/actions/:aidx → update action
      • DELETE /api/groups/:name/steps/:idx/actions/:aidx → delete action
      • POST /api/groups/:name/steps/:idx/actions/reorder → move action
    • Controls
      • POST /api/run-options → set non‑required‑wait (IPC to GUI)
      • POST /api/run → run sequence by name (IPC to GUI)
      • POST /api/run-group → create a temporary __RunGroup__ sequence from a group and run it (IPC)
      • POST /api/stop → stop current run (IPC)
    • Monitors & stream
      • GET /api/monitors → JSON list of monitors (index, width, height)
      • GET /stream.mjpeg?monitor=...&token=... → MJPEG stream for previews
  • Region selection accuracy

    • The web editors compute coordinates inside the actual image content box (object-fit: contain) rather than the element bounds. This keeps natural pixel coordinates stable across preview size changes and browser zoom.
    • The overlay box is drawn in the pane at paneOffset + contentOffset + contentCoord, so the visual selection always matches the drawn image.

Tips

  • After web edits, the server writes config.json, creates time‑stamped backups in the backup configs/ folder (e.g., config.backup.176257XXXX.json), prunes older ones according to backup_retention in server_config.json, and triggers an IPC reload so the desktop GUI reflects changes without restart.
  • If you prefer GUI editing, you can mix and match; web and desktop stay in sync.

Web–Desktop Sync Details

  • Failsafe template name
    • Web saves failsafe.template_name and the server also writes failsafe.template for backward compatibility.
    • Desktop reads either key and preserves selection when repopulating the combo; preview updates immediately.
  • Sequences and Failsafe actions
    • Random movement: set random=true and a random_region in the action; desktop honors random move_to and random region clicks.
    • Action editors only expose fields relevant to the chosen type; saves mirror exactly what the desktop expects.
  • Group Call steps
    • Web Sequences and Failsafe editors both support Group Call steps; Sequences has a top‑level “Group” dropdown for quick adds.
    • Editors load groups before rendering and ensure step selections are visible even if the group name wasn’t in the list yet.

Configuration File

The app reads/writes config.json in the script directory. Template paths are converted to relative paths when possible.

Top-level structure:

{
  "templates": {
    "TemplateName": "images/Template.png"
  },
  "sequences": [
    {
      "name": "Example",
      "steps": [
        {
          "find": "TemplateName",
          "required": true,
          "confidence": 0.8,
          "timeout": 10,
          "search_region": [x, y, width, height],
          "actions": [
            {"type": "move_to", "duration": 0.5},
            {"type": "click"},
            {"type": "type", "text": "hello"},
            {"type": "key_press", "key": "enter"},
            {"type": "wait", "seconds": 0.5},
            {"type": "scroll", "pixels": -300},
            {"type": "click_and_hold", "duration": 1.0}
          ]
        }
      ],
      "loop": false,
      "loop_count": 1
    }
  ],
  "failsafe": {
    "enabled": true,
    "template": "TemplateName",
    "confidence": 0.8,
    "region": [x, y, width, height],
    "sequence": [
      {
        "find": "AnotherTemplate",
        "required": true,
        "timeout": 10,
        "actions": [{"type": "click"}]
      }
    ]
  },
  "break_settings": {
    "enabled": true,
    "max_runtime_seconds": 3600
  }
}

Action dictionary fields (as used across sequences and failsafe steps):

  • Common: type
  • Mouse actions:
    • button (left/right/middle), clicks (int)
    • x, y (for absolute move), duration
    • region (for click randomization), random, random_region (for random move-to)
  • Keyboard: text (type), key (key_press)
  • Timing/scroll: seconds (wait), pixels (scroll)

How Matching and Actions Work

  • Template matching uses OpenCV (cv2.matchTemplate). Confidence threshold is adjustable per step.
  • Feature matching supports multiple detectors and safe fallbacks:
    • Detectors: ORB (fast), AKAZE (scale-robust), SIFT (strong features; requires opencv-contrib-python).
    • Pipeline: KNN + Lowe’s ratio → RANSAC homography → sanity check (area ratio) → center of detected polygon.
    • Fallbacks: If detector fails, automatically tries AKAZE, then SIFT. If all fail, a multi‑scale template match runs.
    • Provide strategy: "feature" and optional min_inliers, ratio_thresh, ransac_thresh per step.
  • If a step has no find template, its actions run directly.
  • move_to can target the detected position or a random point inside a selected region.
  • click defaults to current mouse location unless force_move is used internally for region clicks.
  • click_and_hold acts at the bot’s current position—usually set by a prior move/move_to.

Actions and Inputs (GUI & Web)

  • click: button, clicks; optional random + random_region for random click inside region
  • move: absolute x, y, duration; fallback to detected position in a step when started via web (MOVE without x,y treated like MOVE_TO)
  • move_to: detected template position or explicit x, y + duration; optional random + random_region
  • type: text
  • key_press: key, optional modifiers
  • wait: seconds
  • scroll: pixels
  • click_and_hold: duration at current position

Note (GUI): To keep action rows compact, tuning/optional fields live under Advanced…. Examples include Random-in-Region controls for move_to, per‑action Jitter (px), and Delay Jitter (ms).

Multi-Monitor & Regions

  • Toolbar Monitor Selector:
    • Choose All Monitors or a specific screen; status bar shows the active region.
    • Capture dialogs (Templates tab) respect the selection.
  • Per-Step Monitor Override:
    • In the Step Editor, set the step's Monitor; this constrains detection to that screen when no search_region is set.
    • monitor persists in config.json as null, "ALL", or [x, y, w, h].
  • Region Selection:
    • Select Search Region opens an overlay; if a monitor is chosen, overlay is restricted to that screen.
    • Regions and monitors can be combined; region takes precedence over monitor.
  • Runtime Capture:
    • Uses per-monitor QScreen.grabWindow(...) with devicePixelRatio() for DPI-aware capture.
    • For full desktop, frames are stitched from each monitor according to virtual desktop coordinates.
    • Fallbacks: PIL.ImageGrab.grab(bbox=...) or pyautogui.screenshot(region=...) when needed.

Failsafe Behavior

  • The worker periodically checks the configured failsafe template (check_failsafe_trigger).
  • When detected, the failsafe sequence executes (execute_failsafe_sequence).
  • The Failsafe tab’s Test button runs only the failsafe sequence, using the current GUI configuration.

Break Settings

  • Use the Break Settings tab to set a maximum runtime using hours, minutes, and seconds.
  • When enabled, the cap applies to total runtime and overrides any sequence loop count.
  • The status bar shows a live clock: Elapsed: Hh Mm Ss / Max: Hh Mm Ss.
  • The bot stops automatically when elapsed ≥ max runtime.
  • These values persist to config.json:
    • break_settings.enabled: boolean
    • break_settings.max_runtime_seconds: integer seconds
  • Optional final sequence on break:
    • Toggle: Run final sequence when time is hit and choose the sequence from the dropdown.
    • Behavior: when max runtime is reached, the primary run ends and the selected final sequence is launched once. The final sequence ignores the runtime cap and runs to completion.
    • Live refresh: the dropdown updates immediately when you add, delete, rename, or duplicate sequences — no restart required.
    • Config fields:
      • break_settings.run_final_after_break: boolean
      • break_settings.final_sequence_name: string (sequence name) or "(none)"

Scheduled Sequences

  • Use the Scheduled Sequences tab to start specific sequences automatically at a given time each day.
  • Each schedule row has:
    • Enabled: whether the schedule is active
    • Sequence: the sequence to run
    • Time: daily start time (HH:mm, 24-hour)
  • Behavior toggles:
    • Queue if busy: if a run is already in progress at the scheduled time, starts the scheduled sequence as soon as the current run completes.
    • Preempt if busy: stops the current run immediately and starts the scheduled sequence.
    • Resume previous: when preempting, resumes the original run after the scheduled sequence finishes.
    • Queue and Preempt are mutually exclusive when saved; if Preempt is enabled, Queue is saved as off.
  • Behavior:
    • At the scheduled time, the app starts the selected sequence once per day.
    • If another run is already in progress, the scheduled run is skipped for that day.
    • Scheduled runs ignore the max runtime cap (they run to completion unless stopped).
    • The scheduler checks every 30 seconds.
  • Persistence:
    • scheduled_sequences: array of schedule objects persisted to config.json, each with:
      • enabled: boolean
      • sequence_name: string
      • time: hh:mm AM/PM (12-hour); loader accepts legacy HH:mm.
      • queue_if_busy: boolean
      • preempt_if_busy: boolean
      • resume_previous: boolean
      • last_run_date: YYYY-MM-DD (used to ensure only one run per day)
  • Tips:
    • Make sure your sequence exists and is valid before scheduling.
    • You can add/remove schedule rows anytime; changes are saved with the configuration.
    • New schedule rows default to Enabled = off — toggle on to activate.
    • The status bar shows Next: <Sequence> @ hh:mm AM/PM and updates immediately when you edit schedule rows.

Toolbar Shortcuts

  • The top toolbar includes quick actions for configuration:
    • New: create a fresh configuration
    • Open: load an existing config.json
    • Save: write current configuration
    • Save As: save configuration to a new file
    • These mirror the File menu and make switching configs faster.

Scheduler Notes

  • The scheduler runs only while the app is open.
  • Time parsing supports both hh:mm AM/PM and HH:mm.
  • When a run is active at the scheduled time:
    • With Queue if busy on, the scheduled run starts right after the active run completes.
    • With Preempt if busy on, the current run stops and the scheduled run starts immediately; if Resume previous is on, the original run resumes automatically after the scheduled run completes.
    • With neither on, the scheduled run is skipped for that minute.
  • Tips:
    • Verify with a small cap (e.g., 0h 0m 10s).
    • Ensure Enable Max Runtime is checked; a cap of 0 disables enforcement.
    • Check bot_debug.log for entries like Max runtime reached (...).

Logging & Debugging

  • Runtime logs: bot_debug.log
  • On failed matches, screenshots and templates may be saved under a debug/ folder next to the script
  • Press F8 to stop sequences; the status bar shows progress and messages
  • Action metrics: the engine records per‑action timing results (type, success, elapsed seconds). Entries are summarized in bot_debug.log and retained in memory for recent actions.

Template Tester (Preview)

  • Strategy Dropdown: Default (template) or Feature (scale/rotation).
  • Parameters: Min Inliers, Ratio, RANSAC thresholds.
  • Visualizer controls:
    • Detector: choose ORB, AKAZE, or SIFT for the feature preview.
    • Show Keypoints: toggle overlay of scene keypoints for visual debugging (off by default for performance).
  • Capture Backend:
    • Options: Auto (best), MSS, QScreen.
    • Auto prefers MSS when available, otherwise uses QScreen.
    • An availability indicator shows whether the selected backend is usable.
  • Debug Panel shows:
    • Target screen index, geometry, local capture rect
    • Frame size and bytes-per-line, DPI ratio, pixmap state
    • Backend used (MSS/QScreen) and fallback path (PIL bbox, pyautogui region, stitched full desktop)
  • Metrics: Inliers, Matches, Confidence, RANSAC reprojection error.

Monitor Info

  • Toolbar button opens a dialog listing monitors: index, geometry, and DPI ratio.
  • Capture All stitches a preview from all monitors to validate layout.

Tips for Reliable Matching

  • Use small, high-contrast templates of the exact UI you want to detect
  • Avoid scale/rotation changes; match works best for identical sizes
  • Consider using search_region to narrow detection for speed and accuracy
  • For multi-monitor setups, prefer per-step monitor selection or regions on the target screen.
  • Use Feature strategy for rotated/scaled UI elements; increase min_inliers or adjust ratio_thresh when noisy.
  • For random click/move regions, ensure coordinates are on-screen and correct

Known Limitations

  • Template matching is sensitive to scaling and rotation
  • Feature matching adds overhead; tune thresholds for your scene.
  • Some capture backends may behave differently under extreme DPI or exotic layouts; robust fallbacks are implemented.
  • Some GUI interactions are evolving; if you hit issues (e.g., editing failsafe steps), check logs and report
  • Not all advanced scenarios are fully implemented; contributions are welcome

Contributing

  • Issues and PRs are appreciated
  • Keep changes focused and documented
  • Please avoid using this tool for anything that breaks app/game ToS

License

This project is for personal use; choose an appropriate license before public release if needed

  • IPC and compiled mode
    • Web server writes ipc_command.json to the executable folder when compiled; GUI reads and deletes it after handling.
    • Web server serves static pages from web/static; compiled builds also fall back to _internal/web/static.
    • Launcher’s “Open Web Portal” targets http://localhost:<port>/?token=... to avoid 0.0.0.0 in browsers.
  • Web-start runs
    • The GUI applies break_settings.enabled and break_settings.max_runtime_seconds from config.json when sequences are started via web IPC.
    • Status bar shows Max runtime: Hh Mm Ss and live elapsed.
    • On cap hit, the run stops and runs the configured final sequence once (if set).

Troubleshooting

  • Web portal 404 on root (/?token=...) in compiled build:
    • Launch WebServer.exe from dist/ProjectBundle; compiled server serves from web/static and _internal/web/static.
  • Launcher restarts itself instead of starting GUI/server:
    • Use Launcher.exe from dist/ProjectBundle (same folder as ImageDetectionBot.exe and WebServer.exe).
  • Web-start doesn’t show current step in GUI:
    • GUI switches to Sequences, selects the active sequence, and brings window to foreground.
  • “MOVE action requires x and y coordinates” after web-start:
    • MOVE without x,y uses the detected position in the step (treated like MOVE_TO) to keep actions running.
  • Random movement not saved in web failsafe:
    • Web editors save random and random_region; compiled GUI honors random move_to and random-region clicks.
  • MJPEG preview shows net::ERR_ABORTED occasionally:
    • That’s a reconnect artifact; it does not affect saving, IPC, or UI updates.
  • Templates list appears but bot fails to load a template:
    • The UI is designed to stay populated even if individual templates fail to load into the bot.
    • Check bot_debug.log for the failing template path; ensure the image exists at the resolved absolute path.
    • Use the Templates tab to update the path or re-capture and save the template.
  • Recorder tab:
    • Start Recording to capture global mouse moves, clicks, scrolls, and key presses (requires pynput).
    • Stop to end capture, then optionally delete selected events.
    • Save… prompts for a friendly name; stores recordings/<name>.json with events and name.
    • Preview Overlay replays visually without performing any input.
    • Playback executes the recording now.
    • Use Play Recording in any step editor to run a saved recording; pick from the dropdown that lists files in recordings/.
    • Pause/Resume: temporarily pause capture during a session; resuming preserves timing and continues appending events.
    • Add Marker: insert labeled markers into the timeline to annotate moments.
    • Library Bar: browse and manage saved recordings — Refresh list, Open into the table, Play immediately, Rename, Delete.
  • Play Recording action fields:
    • Recording: select from the recordings/ folder via dropdown.
    • Speed: multiplier to compress or expand event intervals (e.g., 2.0 runs twice as fast).
    • Start Transition (s): optional smooth pre‑roll move to the first recorded position (set to 0.0 for auto distance‑based duration).
  • Humanization:
  • Per-action Jitter (px): small random offset applied to target positions for click, move, move_to, right_click, double_click, click_and_hold.
  • Per-action Delay Jitter (ms): random small delay inserted before action execution to simulate human variability.
  • Implementation details:
    • Speed scales per‑event intervals; internal pyautogui.PAUSE is disabled during playback to avoid unintended delays.
    • Start Transition uses either the configured duration or an auto duration based on cursor distance and speed.
    • Playback and action results are logged to bot_debug.log for traceability.
  • Advanced (GUI): Jitter (px) and Delay Jitter (ms) are available via Advanced… and intentionally hidden from the main row to save space.