You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: "AutoBot accurately identifies and localizes deceptive patterns from website screenshots without relying on HTML code, achieving an F1-score of 0.93."
author={Nayak, Asmit and Zhang, Shirley and Wani, Yash and Khandelwal, Rishabh and Fawaz, Kassem},
19
-
journal={arXiv preprint arXiv:2411.07441},
18
+
author={Nayak, Asmit and Wani, Yash and Zhang, Shirley and Khandelwal, Rishabh and Fawaz, Kassem},
20
19
year={2024}
21
20
}
22
21
---
@@ -38,9 +37,13 @@ bibtex: |
38
37
<divclass="abstract">
39
38
<p>We introduce our <strong>AutoBot framework</strong> to address this gap and help web stakeholders navigate and mitigate online deceptive patterns. AutoBot accurately identifies and localizes deceptive patterns from a screenshot of a website without relying on the underlying HTML code. AutoBot employs a two-stage pipeline that leverages the capabilities of specialized vision models to analyze website screenshots, identify interactive elements, and extract textual features. Next, using a large language model, AutoBot understands the context surrounding these elements to determine the presence of deceptive patterns.</p>
40
39
41
-
<p>We also use AutoBot to create a synthetic dataset to distill knowledge from 'teacher' LLMs to smaller language models. Through extensive evaluation, we demonstrate AutoBot's effectiveness in detecting deceptive patterns on the web, achieving an <strong>F1-score of 0.93</strong> when detecting deceptive patterns, underscoring its potential as an essential tool for mitigating online deceptive patterns.</p>
40
+
<p>We also use AutoBot to create a synthetic dataset to distill knowledge from 'teacher' LLMs to smaller language models. Through extensive evaluation, we demonstrate AutoBot's effectiveness in detecting deceptive patterns on the web, achieving an <strong>F1-score of 0.93</strong> when detecting deceptive patterns, underscoring its potential as an essential tool for mitigating online deceptive patterns. We implement AutoBot across three downstream applications targeting different web stakeholders:</p>
42
41
43
-
<p>We implement AutoBot across three downstream applications targeting different web stakeholders: **(1)** a local browser extension providing users with real-time feedback, **(2)** a Lighthouse audit to inform developers of potential deceptive patterns on their sites, and **(3)** a measurement tool designed for researchers and regulators.</p>
42
+
<ol>
43
+
<li> A local browser extension providing users with real-time feedback </li>
44
+
<li> A Lighthouse audit to inform developers of potential deceptive patterns on their sites </li>
45
+
<li> A measurement tool designed for researchers and regulators </li>
46
+
</ol>
44
47
</div>
45
48
46
49
@@ -51,15 +54,56 @@ bibtex: |
51
54
<figcaption>AutoBot's two-stage pipeline: (1) Vision Module to localize UI elements and extract features, (2) Language Module to detect deceptive patterns.</figcaption>
52
55
</div>
53
56
54
-
AutoBot adopts a modular design, breaking down the task into two distinct modules:
57
+
AutoBot adopts a modular design, breaking down the task into two distinct modules: a Vision Module for element localization and feature extraction, and a Language Module for deceptive pattern detection. This approach allows AutoBot to work with screenshots alone, without requiring access to the underlying HTML code, which tends to be less stable across different webpage implementations.
55
58
56
-
1.**Vision Module**: Analyzes screenshots to accurately localize UI elements, extracting essential features including text, visual attributes, and spatial relationships.
59
+
### Vision Module
57
60
58
-
2.**Language Module**: Leverages large language models to understand the context surrounding these elements and determine the presence of deceptive patterns.
61
+
To address high false positive rates and localization issues, the Vision Module parses a webpage screenshot and maps it to a tabular representation we call *ElementMap*. As illustrated in the figure above, the *ElementMap* contains the text associated with each UI element, along with its features: element type, bounding box coordinates, font size, background color, and font color. For UI element detection, we train a YOLOv10 model on a synthetically generated dataset. The evaluation of our model is presented below.
59
62
60
-
This approach allows AutoBot to work with screenshots alone, without requiring access to the underlying HTML code, which tends to be less stable across different webpage implementations.
<figcaption style="margin-top: 10px; font-size: 0.9em; text-align: center;">(b) Evaluation of YOLO (Ours) vs Molmo across different UI element types</figcaption>
72
+
</div>
73
+
</div>
74
+
<figcaption>The Vision Module processes screenshots to extract UI elements and their features into an <em>ElementMap</em> representation.</figcaption>
75
+
</div>
76
+
77
+
### Language Module
78
+
79
+
The Language Module takes the *ElementMap* as input and maps each element to a deceptive pattern from our taxonomy. This module reasons about each element considering its spatial context and visual features. We explore different instantiations of this module—such as distilling smaller models like Qwen and T5 from a larger teacher model like Gemini—to achieve various trade-offs in terms of cost, need for training, and accuracy.
<figcaption>The Language Module analyzes <em>ElementMap data</em> to identify and classify deceptive patterns in context.</figcaption>
84
+
</div>
85
+
86
+
## E2E Evaluation
87
+
88
+
We evaluate AutoBot's end-to-end performance by comparing different instantiations of our Language Module on the task of deceptive pattern detection. The interactive visualization below presents the performance metrics of *AutoBot* using a range of LLM -- Gemini, distilled Qwen-2.5-1.5B and distilled T5-base models. These results demonstrate how different model choices affect detection accuracy, precision, and recall across our deceptive pattern taxonomy.
<figcaptionstyle="max-width:90vw;">Interactive comparison of performance of <em>AutoBot</em>(with three underlying language models: Gemini, Qwen, and T5) at the Category Level.</figcaption>
<figcaptionstyle="">Interactive comparison of performance of <em>AutoBot</em>(with three underlying language models: Gemini, Qwen, and T5) and <em>DPGuard</em> at the Subtype Level</figcaption>
104
+
</div>
61
105
62
-
## Key Results
106
+
<!--## Key Results
63
107
64
108
### Performance Metrics
65
109
@@ -165,7 +209,7 @@ Our user study with real participants demonstrates that:
165
209
166
210
- AutoBot's visual highlighting significantly improves user awareness of deceptive patterns
167
211
- The highlighting system does not negatively impact perceived website usability
168
-
- Users appreciate the real-time feedback and find it helpful for making informed decisions
212
+
- Users appreciate the real-time feedback and find it helpful for making informed decisions
169
213
170
214
## Knowledge Distillation
171
215
@@ -188,4 +232,4 @@ AutoBot represents a significant step forward in combating deceptive patterns on
188
232
Future work includes:
189
233
- Expanding the taxonomy to cover emerging deceptive pattern types
190
234
- Improving multilingual support for global deployment
191
-
- Developing automated remediation suggestions for developers
235
+
- Developing automated remediation suggestions for developers-->
0 commit comments