You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bias detection tool currently works for tabular numerical and categorical data. The *Hierarchical Bias-Aware Clustering* (HBAC) algorithm processes input data according to the k-means or k-modes clustering algorithm. The HBAC-algorithm is introduced by Misztal-Radecka and Indurkya in a [scientific article](https://www.sciencedirect.com/science/article/abs/pii/S0306457321000285) as published in *Information Processing and Management* (2021). Our implementation of the HBAC-algorithm can be found on <ahref="https://github.com/NGO-Algorithm-Audit/unsupervised-bias-detection/blob/master/README.md"target="_blank">Github</a>.
115
+
The bias detection tool currently works for tabular numerical and categorical data. The _Hierarchical Bias-Aware Clustering_ (HBAC) algorithm processes input data according to the k-means or k-modes clustering algorithm. The HBAC-algorithm is introduced by Misztal-Radecka and Indurkya in a [scientific article](https://www.sciencedirect.com/science/article/abs/pii/S0306457321000285) as published in *Information Processing and Management* (2021). Our implementation of the HBAC-algorithm can be found on <ahref="https://github.com/NGO-Algorithm-Audit/unsupervised-bias-detection/blob/master/README.md"target="_blank">Github</a>.
***Quantitative-qualitative joint method**: Data-driven bias testing combined with the balanced and context-sensitive judgment of human experts;
129
-
***Unsupervised bias detection**: No user data needed on protected attributes;
130
-
***Bias scan tool**: Scalable method based on statistical learning to detect algorithmic bias;
131
-
***Detects complex bias**: Identifies unfairly treated groups characterized by mixture of features, detects intersectional bias;
132
-
***Model-agnostic**: Works for all AI systems;
133
-
***Open-source and not-for-profit**: Easy to use and available for the entire AI auditing community.
123
+
-**Quantitative-qualitative joint method**: Data-driven bias testing combined with the balanced and context-sensitive judgment of human experts;
124
+
-**Unsupervised bias detection**: No user data needed on protected attributes;
125
+
-**Bias scan tool**: Scalable method based on statistical learning to detect algorithmic bias;
126
+
-**Detects complex bias**: Identifies unfairly treated groups characterized by mixture of features, detects intersectional bias;
127
+
-**Model-agnostic**: Works for all AI systems;
128
+
-**Open-source and not-for-profit**: Easy to use and available for the entire AI auditing community.
134
129
135
130
##### By whom can the bias detection tool be used?
136
131
@@ -146,11 +141,11 @@ No. The bias detection tool serves as a starting point to assess potentially unf
146
141
147
142
##### How is my data processed?
148
143
149
-
The tool is privacy preserving. It uses computing power of your own computer to analyze a dataset. In this architectural setup, data is processed entirely on your device and it not uploaded to any third party, such as cloud providers. This local-only feature allows organisations to securely use the tool with proprietary data. The used software is also available as <ahref="https://pypi.org/project/unsupervised-bias-detection/"target="_blank">pip package</a> `unsupervised-bias-detection`. [](https://pypi.org/project/unsupervised-bias-detection/)
144
+
The tool is privacy preserving. It uses computing power of your own computer to analyze a dataset. In this architectural setup, data is processed entirely on your device and it not uploaded to any third party, such as cloud providers. This local-only feature allows organisations to securely use the tool with proprietary data. The used software is also available as <ahref="https://pypi.org/project/unsupervised-bias-detection/"target="_blank">pip package</a> `unsupervised-bias-detection`. [](https://pypi.org/project/unsupervised-bias-detection/)
150
145
151
146
##### In sum
152
147
153
-
Quantitative methods, such as unsupervised bias detection, are helpful to discover potentially unfair treated groups of similar users in AI systems in a scalable manner. Automated identification of cluster disparities in AI models allows human experts to assess observed disparities in a qualitative manner, subject to political, social and environmental traits. This two-pronged approach bridges the gap between the qualitative requirements of law and ethics, and the quantitative nature of AI (see figure). In making normative advice, on identified ethical issues publicly available, over time a [repository](/algoprudence/) of case reviews emerges. We call case-based normative advice for ethical algorithm *algoprudence*. Data scientists and public authorities can learn from our algoprudence and can criticise it, as ultimately normative decisions regarding fair AI should be made within democratic sight.
148
+
Quantitative methods, such as unsupervised bias detection, are helpful to discover potentially unfair treated groups of similar users in AI systems in a scalable manner. Automated identification of cluster disparities in AI models allows human experts to assess observed disparities in a qualitative manner, subject to political, social and environmental traits. This two-pronged approach bridges the gap between the qualitative requirements of law and ethics, and the quantitative nature of AI (see figure). In making normative advice, on identified ethical issues publicly available, over time a [repository](/algoprudence/) of case reviews emerges. We call case-based normative advice for ethical algorithm _algoprudence_. Data scientists and public authorities can learn from our algoprudence and can criticise it, as ultimately normative decisions regarding fair AI should be made within democratic sight.
154
149
155
150
[Read more](/algoprudence/how-we-work/) about algoprudence and how Algorithm Audit's builds it.
- title: Identification of AI-systems and high-risk algorithms
20
20
content: >
21
-
By answering maximum 8 targeted questions, you can determine whether a data-driven application qualifies as an AI-system or as an impactful algorithm. Complete the dynamic questionnaire to find out.
21
+
By answering maximum 8 targeted questions, you can determine whether a
22
+
data-driven application qualifies as an AI-system or as an impactful
23
+
algorithm. Complete the dynamic questionnaire to find out.
22
24
icon: fas fa-search
23
25
id: quick-scan
24
26
items:
25
27
- title: Identify
26
28
icon: fas fa-star
27
29
link: classification-quick-scan/#form
28
-
# - title: Documentatie en classificatie tool
29
-
# content: >
30
-
# Organisatiebreed algoritmemanagementbeleid vraagt om pragmatische kaders.
31
-
# Hierbij is een risico-georienteerde aanpak vaak leidend. Dit is in lijn
32
-
# met nationale en internationale wetgeving voor algoritmes, zoals de AI
33
-
# Verordening. Hieronder vindt u een voorbeeld van verschillende dynamische
34
-
# vragenlijsten die organisaties helpen om algoritmes en AI-systemen te
35
-
# indexeren, documenteren en classificeren.
36
-
# icon: far fa-file
37
-
# id: organisation-wide
38
-
# items:
39
-
# - title: 1. Intakeformulier
40
-
# icon: fa fa-plus
41
-
# link: intake/#form
42
-
# - title: 2. Sturing en verantwoording
43
-
# icon: fas fa-user-tag
44
-
# link: roles-and-responsibilities/#form
45
-
# - title: 3. Privacy
46
-
# icon: fa fa-eye
47
-
# link: privacy/#form
48
-
# - title: 4. Data en model
49
-
# icon: fas fa-share-alt
50
-
# link: data-and-model/#form
51
-
# - title: AI Verordening
52
-
# content: >
53
-
# Deze dynamische vragenlijsten helpen je grip te krijgen op de AI
54
-
# Verordening.
55
-
# icon: fas fa-gavel
56
-
# id: ai-act
57
-
# items:
58
-
# - title: AI-systeem classificatie
59
-
# icon: fas fa-code-branch
60
-
# link: AI-system-classification/#form
61
-
# - title: Risicoclassificatie
62
-
# icon: fas fa-mountain
63
-
# link: risk-classification/#form
64
30
---
65
31
66
-
{{< container_open icon="fas fa-wrench" title="Documentatie en classificatie tool" id="landing-container" >}}
De bias detectie tool werkt momenteel alleen voor numeriek data. Volgens een hierarchisch schema clustert het *Hierarchical Bias-Aware Clustering* (HBAC) algoritme input data met behulp van k-means clustering algoritme. Op termijn kan de tool ook categorische data verwerken volgens k-modes clustering. Het HBAC-algoritme is geïntroduceerd door Misztal-Radecka en Indurkya in een [wetenschappelijk artikel](https://www.sciencedirect.com/science/article/abs/pii/S0306457321000285) in *Information Processing and Management* (2021). Onze implementatie van het HBAC-algoritme is open source en kan worden gevonden in [Github.](https://github.com/NGO-Algorithm-Audit/AI_Audit_Challenge)
96
+
De bias detectie tool werkt momenteel alleen voor numeriek data. Volgens een hierarchisch schema clustert het _Hierarchical Bias-Aware Clustering_ (HBAC) algoritme input data met behulp van k-means clustering algoritme. Op termijn kan de tool ook categorische data verwerken volgens k-modes clustering. Het HBAC-algoritme is geïntroduceerd door Misztal-Radecka en Indurkya in een [wetenschappelijk artikel](https://www.sciencedirect.com/science/article/abs/pii/S0306457321000285) in *Information Processing and Management* (2021). Onze implementatie van het HBAC-algoritme is open source en kan worden gevonden in [Github.](https://github.com/NGO-Algorithm-Audit/AI_Audit_Challenge)
102
97
103
98
[Download](https://github.com/NGO-Algorithm-Audit/Bias_scan/blob/master/classifiers/BERT_disinformation_classifier/test_pred_BERT.csv) een voorbeeld dataset om de bias detectie tool te gebruiken.
104
99
@@ -108,10 +103,10 @@ De bias detectie tool werkt momenteel alleen voor numeriek data. Volgens een hie
108
103
109
104
Welke input data kan de bias detectie tool verwerken? Een csv-bestand van maximaal 5GB met kolommen kenmerken (`features`), de voorspelde waarde (`pred_label`) en de echte waarde (`true_label`). Alleen de volgorde van de kolommen is van belang (eerst `features`, dan `pred_label`, dan `true_label`). Alle kolommen moeten numeriek en ongeschaald (niet gestandaardiseerd of genormaliseerd) zijn. Samengevat:
110
105
111
-
*`features`: ongeschaalde numerieke waarden, bijvoorbeeld `kenmerk_1`, `kenmerk_2`, ..., `kenmerk_n`;
0 commit comments