-
Notifications
You must be signed in to change notification settings - Fork 297
feat(extension): improve recording fidelity and iframe gating #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves recording fidelity and reduces noise by implementing iframe interaction gating, new-tab intent correlation, and aggressive deduplication. It removes extraction/semantic targeting features, adds a configurable options page for iframe/blocklist settings, and ensures all frames record events with proper frame context.
Key changes:
- Activated tab + new-tab intent filtering to suppress ad/tracker noise
- Iframe meta navigation gating (only allow from user-interacted frames within time window)
- Input/scroll/navigation deduplication to collapse rapid repeats
- Extension options page for iframe recording and domain allow/deny lists
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
extension/wxt.config.ts |
Added storage permission and broader host permissions for iframe injection |
extension/src/lib/workflow-types.ts |
Removed extraction/radio/select/checkbox step types; added frameIdPath to remaining steps |
extension/src/lib/types.ts |
Added frameIdPath/frameUrl to events; removed extraction event type |
extension/src/entrypoints/sidepanel/components/recording-view.tsx |
Removed extraction UI and stats display |
extension/src/entrypoints/options.html |
New options page for iframe settings and domain lists |
extension/src/entrypoints/content.ts |
Enabled all frames; added frame context to all events; removed semantic extraction logic; added input debouncing |
extension/src/entrypoints/background.ts |
Activated tab gating; iframe navigation filtering; step deduplication; removed extraction handling |
.github/copilot-instructions.md |
New contributor/AI agent onboarding guide |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
extension/src/entrypoints/content.ts
Outdated
| const frameIdPath = (() => { | ||
| try { | ||
| let win: any = window; const parts: number[] = []; | ||
| while (win !== win.parent) { const parent = win.parent; let idx=0; for (let i=0;i<parent.frames.length;i++){ if(parent.frames[i]===win){idx=i;break;} } parts.unshift(idx); win=parent; if(parts.length>10) break; } | ||
| return parts.length ? parts.join('.') : '0'; | ||
| } catch { return '0'; } | ||
| })(); |
Copilot
AI
Oct 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frameIdPath calculation logic is duplicated in multiple handler functions (at lines 127-133, 259-275, 305-311, 361, and 443). Extract this into a shared helper function to reduce code duplication and improve maintainability.
| const times = interactedFrameTimes[rrEvent.tabId] || {}; | ||
| const lastTs = times[fUrl]; | ||
| if (!lastTs) break; | ||
| if (Date.now() - lastTs > settings.iframeWindow) break; |
Copilot
AI
Oct 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time comparison uses Date.now() but lastTs comes from interactedFrameTimes which stores Date.now() at interaction time (line 646), while rrEvent.timestamp is used elsewhere. This creates an inconsistency where the window calculation mixes server-side event timestamps with client-side current time. Consider using rrEvent.timestamp - lastTs > settings.iframeWindow for consistent time-based filtering.
| if (Date.now() - lastTs > settings.iframeWindow) break; | |
| if (rrEvent.timestamp - lastTs > settings.iframeWindow) break; |
| // Hostname patterns for iframe navigation noise we want to suppress | ||
| const BLOCKED_IFRAME_HOST_PATTERNS: RegExp[] = [ | ||
| /doubleclick\.net$/i, | ||
| /googlesyndication\.com$/i, | ||
| /googleadservices\.com$/i, | ||
| /amazon-adsystem\.com$/i, | ||
| /recaptcha\.google\.com$/i, | ||
| /recaptcha\.net$/i, | ||
| /googletagmanager\.com$/i, | ||
| /indexww\.com$/i, | ||
| /adtrafficquality\.google$/i, | ||
| /2mdn\.net$/i, | ||
| /gstaticadssl\.googleapis\.com$/i, | ||
| ]; | ||
|
|
Copilot
AI
Oct 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constant BLOCKED_IFRAME_HOST_PATTERNS is defined but never used in the code. The filtering logic instead uses settings.blocklist loaded from chrome.storage (lines 534-537). Either remove this unused constant or integrate it as the default blocklist value.
| // Hostname patterns for iframe navigation noise we want to suppress | |
| const BLOCKED_IFRAME_HOST_PATTERNS: RegExp[] = [ | |
| /doubleclick\.net$/i, | |
| /googlesyndication\.com$/i, | |
| /googleadservices\.com$/i, | |
| /amazon-adsystem\.com$/i, | |
| /recaptcha\.google\.com$/i, | |
| /recaptcha\.net$/i, | |
| /googletagmanager\.com$/i, | |
| /indexww\.com$/i, | |
| /adtrafficquality\.google$/i, | |
| /2mdn\.net$/i, | |
| /gstaticadssl\.googleapis\.com$/i, | |
| ]; |
| allowlist: [] as string[], | ||
| }; | ||
| let settings: { enableIframes: boolean; iframeWindow: number; blocklist: string[]; allowlist: string[] } = { ...DEFAULT_SETTINGS }; | ||
| chrome.storage.sync.get(DEFAULT_SETTINGS, (s: any) => { settings = { ...settings, ...s }; }); |
Copilot
AI
Oct 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chrome.storage.sync.get callback executes asynchronously, but the settings variable is immediately accessed by message handlers. This creates a race condition where early events may use default settings instead of user-configured values. Consider awaiting the storage load or deferring event processing until settings are initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 issues found across 8 files
Prompt for AI agents (all 3 issues)
Understand the root cause of the following 3 issues and fix them.
<file name="extension/src/entrypoints/options.html">
<violation number="1" location="extension/src/entrypoints/options.html:32">
The blocklist placeholder uses a literal "\n", so the UI shows "example.com\nads.example.org" instead of putting the second domain on a new line. Use an actual newline or an HTML entity to render the placeholder correctly.</violation>
</file>
<file name="extension/src/entrypoints/background.ts">
<violation number="1" location="extension/src/entrypoints/background.ts:364">
Navigation consolidation currently overwrites the previous navigation step for a tab, so later navigations replace earlier history instead of being appended. This drops essential steps for replay.</violation>
<violation number="2" location="extension/src/entrypoints/background.ts:551">
Applying the iframe window using Date.now() causes previously accepted iframe navigations to disappear once enough wall-clock time passes; compare against the rrweb event timestamp instead so stored steps remain stable.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
| </div> | ||
| <div class="row"> | ||
| <label for="blocklist">Blocked domains (newline separated)</label> | ||
| <textarea id="blocklist" placeholder="example.com\nads.example.org"></textarea> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blocklist placeholder uses a literal "\n", so the UI shows "example.com\nads.example.org" instead of putting the second domain on a new line. Use an actual newline or an HTML entity to render the placeholder correctly.
Prompt for AI agents
Address the following comment on extension/src/entrypoints/options.html at line 32:
<comment>The blocklist placeholder uses a literal "\n", so the UI shows "example.com\nads.example.org" instead of putting the second domain on a new line. Use an actual newline or an HTML entity to render the placeholder correctly.</comment>
<file context>
@@ -0,0 +1,80 @@
+ </div>
+ <div class="row">
+ <label for="blocklist">Blocked domains (newline separated)</label>
+ <textarea id="blocklist" placeholder="example.com\nads.example.org"></textarea>
+ </div>
+ <div class="row">
</file context>
| <textarea id="blocklist" placeholder="example.com\nads.example.org"></textarea> | |
| <textarea id="blocklist" placeholder="example.com ads.example.org"></textarea> |
| const times = interactedFrameTimes[rrEvent.tabId] || {}; | ||
| const lastTs = times[fUrl]; | ||
| if (!lastTs) break; | ||
| if (Date.now() - lastTs > settings.iframeWindow) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying the iframe window using Date.now() causes previously accepted iframe navigations to disappear once enough wall-clock time passes; compare against the rrweb event timestamp instead so stored steps remain stable.
Prompt for AI agents
Address the following comment on extension/src/entrypoints/background.ts at line 551:
<comment>Applying the iframe window using Date.now() causes previously accepted iframe navigations to disappear once enough wall-clock time passes; compare against the rrweb event timestamp instead so stored steps remain stable.</comment>
<file context>
@@ -479,38 +519,52 @@ export default defineBackground(() => {
+ const times = interactedFrameTimes[rrEvent.tabId] || {};
+ const lastTs = times[fUrl];
+ if (!lastTs) break;
+ if (Date.now() - lastTs > settings.iframeWindow) break;
+ }
+ } catch {
</file context>
| if (Date.now() - lastTs > settings.iframeWindow) break; | |
| if (rrEvent.timestamp - lastTs > settings.iframeWindow) break; |
✅ Addressed in 55c8768
| steps[existingIdx].type === "navigation" | ||
| ) { | ||
| // Update existing navigation (redirect / title change) | ||
| (steps[existingIdx] as NavigationStep).url = navUrl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Navigation consolidation currently overwrites the previous navigation step for a tab, so later navigations replace earlier history instead of being appended. This drops essential steps for replay.
Prompt for AI agents
Address the following comment on extension/src/entrypoints/background.ts at line 364:
<comment>Navigation consolidation currently overwrites the previous navigation step for a tab, so later navigations replace earlier history instead of being appended. This drops essential steps for replay.</comment>
<file context>
@@ -295,119 +319,127 @@ export default defineBackground(() => {
+ steps[existingIdx].type === "navigation"
) {
+ // Update existing navigation (redirect / title change)
+ (steps[existingIdx] as NavigationStep).url = navUrl;
+ steps[existingIdx].timestamp = event.timestamp;
+ } else {
</file context>
|
/copilot review again |
.github/copilot-instructions.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a screen recording and screenshots before and after for this (will help me merge quickly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!targetElement || !("value" in targetElement)) return; | ||
| const isPassword = targetElement.type === "password"; | ||
|
|
||
| // Ignore programmatic (non user-trusted) input events – these often cause massive duplication |
Copilot
AI
Oct 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a hyphen instead of an en-dash in the comment.
| // Ignore programmatic (non user-trusted) input events – these often cause massive duplication | |
| // Ignore programmatic (non user-trusted) input events - these often cause massive duplication |
| const times = interactedFrameTimes[rrEvent.tabId] || {}; | ||
| const lastTs = times[fUrl]; | ||
| if (!lastTs) break; | ||
| const eventTimestamp = typeof (rrEvent as any).timestamp === "number" ? (rrEvent as any).timestamp : Date.now(); |
Copilot
AI
Oct 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The property (rrEvent as any).timestamp is accessed twice. Store it in a variable to improve readability and avoid redundant type casting.
| const eventTimestamp = typeof (rrEvent as any).timestamp === "number" ? (rrEvent as any).timestamp : Date.now(); | |
| const rrEventTimestamp = (rrEvent as any).timestamp; | |
| const eventTimestamp = typeof rrEventTimestamp === "number" ? rrEventTimestamp : Date.now(); |
| if ( | ||
| type !== "CUSTOM_TAB_ACTIVATED" && | ||
| !activatedTabs.has(tabId) && | ||
| !(payload.openerTabId && recentNewTabIntents[payload.openerTabId] && Date.now() - recentNewTabIntents[payload.openerTabId] < NEW_TAB_INTENT_WINDOW_MS) |
Copilot
AI
Oct 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression recentNewTabIntents[payload.openerTabId] is accessed twice in the same condition. Store it in a variable to improve readability and avoid redundant lookups.
| if ( | |
| type !== "CUSTOM_TAB_ACTIVATED" && | |
| !activatedTabs.has(tabId) && | |
| !(payload.openerTabId && recentNewTabIntents[payload.openerTabId] && Date.now() - recentNewTabIntents[payload.openerTabId] < NEW_TAB_INTENT_WINDOW_MS) | |
| const openerIntentTime = payload.openerTabId ? recentNewTabIntents[payload.openerTabId] : undefined; | |
| if ( | |
| type !== "CUSTOM_TAB_ACTIVATED" && | |
| !activatedTabs.has(tabId) && | |
| !(payload.openerTabId && openerIntentTime && Date.now() - openerIntentTime < NEW_TAB_INTENT_WINDOW_MS) |
|
@sauravpanda Let me know if this makes it easier |
|
Thanks for the screenshot, does this expects the system to use semantic abstraction? because the previous version used xpath and css selectors to take action when executing the workflow. |
| | NavigationStep | ||
| | ClickStep | ||
| | InputStep | ||
| | RadioStep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have we removed this steps? Any specific reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets keep it then, I dont want to break the old version, I will dive deeper and do a cleanup soon. So only make changes which you feel necessary, dont remove any old code.
| cssSelector?: string; // Optional in source | ||
| elementTag: string; | ||
| elementText: string; | ||
| targetText?: string; // Semantic targeting text (label, placeholder, aria-label, etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need this for the semantic abstraction, dont remove this
| cssSelector?: string; // Optional in source | ||
| elementTag: string; | ||
| value: string; | ||
| targetText?: string; // Semantic targeting text (label, placeholder, aria-label, etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same I need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok let me see the version update i think i rushed this previous pr i thought i just had to split it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (parts.length > 10) break; | ||
| } | ||
| return parts.length ? parts.join('.') : '0'; | ||
| } catch { |
Copilot
AI
Oct 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The function accesses parent frames without checking cross-origin restrictions. This may throw security errors when frames are cross-origin, but the catch block handles it. Consider logging the error for debugging purposes.
| } catch { | |
| } catch (error) { | |
| console.warn("computeFrameIdPath: Failed to access parent frame (possibly cross-origin)", error); |
| blocklist: [ | ||
| 'doubleclick.net','googlesyndication.com','googleadservices.com', | ||
| 'amazon-adsystem.com','2mdn.net','recaptcha.google.com','recaptcha.net', | ||
| 'googletagmanager.com','indexww.com','adtrafficquality.google','gstaticadssl.googleapis.com' |
Copilot
AI
Oct 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default blocklist is hardcoded in two places (background.ts and options.html). Consider extracting this to a shared constant to ensure consistency and easier maintenance.
| (lastStep as ScrollStep).scrollX === scrollData.x && | ||
| (lastStep as ScrollStep).scrollY === scrollData.y && | ||
| Math.abs(rrEvent.timestamp - lastStep.timestamp) < 200 && |
Copilot
AI
Oct 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded value '200' (milliseconds) for deduplication should be extracted as a named constant (e.g., SCROLL_DEDUPE_WINDOW_MS) for better maintainability and clarity.
| case 'scroll': { | ||
| const sameXY = (last as any).scrollX === (step as any).scrollX && (last as any).scrollY === (step as any).scrollY; | ||
| const sameUrl = (last as any).url === (step as any).url; | ||
| const nearTime = Math.abs(step.timestamp - last.timestamp) < 200; |
Copilot
AI
Oct 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded value '200' (milliseconds) appears again here for scroll deduplication. This should use the same named constant suggested in Comment 5 to maintain consistency.
| }; | ||
|
|
||
| // Dedupe rule 1: If value unchanged for this element and within debounce window, skip | ||
| const DEBOUNCE_MS_INPUT = 1500; |
Copilot
AI
Oct 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The input debounce value is defined locally within the function. Consider defining this at the module level alongside other constants for consistency with the scroll DEBOUNCE_MS constant.
| if ( | ||
| inputData.value === "" && | ||
| prev && | ||
| prev.value === "" && | ||
| inputData.timestamp - prev.ts < 5000 | ||
| ) { |
Copilot
AI
Oct 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded value '5000' (milliseconds) for empty input suppression should be extracted as a named constant for better maintainability.
| const prior = lastInputPerKey[key]; | ||
| const nowTs = inputEvent.timestamp; | ||
| const isEmpty = (inputEvent as any).value === ""; | ||
| if (isEmpty && prior && prior.value === "" && nowTs - prior.ts < 5000) { |
Copilot
AI
Oct 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded value '5000' (milliseconds) appears in both content.ts and background.ts for empty input suppression. Extract this as a shared constant to ensure consistency.


Summary Splitting Extension work from PR #87
Records iframe interactions reliably without flooding steps.
Replays deterministic steps inside the correct iframe instead of navigating into the iframe URL.
Filters ad/analytics/recaptcha iframe navigations and about:blank.
Deduplicates steps (including scroll/input/click) to match real user intent.
Extension: Content (src/entrypoints/content.ts)
Enabled all frames and about:blank: allFrames: true, matchAboutBlank: true.
Added frame context on all events: frameUrl, frameIdPath, isTopFrame.
Input noise reduction: isTrusted checks, debounce per element, skip rapid empty repeats.
New-tab intent signal: PREPARE_NEW_TAB emitted on modifier/middle-clicks and target=_blank.
Extension: Background (src/entrypoints/background.ts)
Activated tab gating + new-tab intent correlation; ignore unactivated/background tabs.
Iframe nav filtering:
Tracks interactedFrameUrls per tab; only accepts rrweb Meta navigations from frames the user interacted with.
Drops about:blank; adds time-window gating (defaults to 3s after interaction).
Allow/deny logic (allowlist overrides blocklist), with settings persisted.
rrweb payload handling:
Scroll: merge consecutive updates; suppress chrome://newtab; dedupe identical XY within 200ms regardless of targetId.
Meta: convert to navigation step only if allowed by the above rules.
Step dedupe post-pass: collapse consecutive duplicates for navigation/input/click/scroll/key_press; update timestamp/screenshot instead of appending.
Navigation consolidation: per-tab merge of redirects to reduce churn.
Persisted recording state: default stopped; load/broadcast on startup; save on start/stop.
Extension: Types/Schema
types.ts: Added optional frameIdPath to StoredCustom* and frameUrl to rrweb.
workflow-types.ts: Promoted frameIdPath to Step types (Click/Input/KeyPress/Navigation/Scroll).
@sauravpanda
Summary by cubic
Improved extension recording fidelity with iframe-aware capture and strong noise gating so steps are cleaner and replay reliably in the correct frame. Added settings to control iframe recording and domain filters, and reduced duplicate scroll/input/click/navigation events.
New Features
Refactors