Skip to content

Commit c91eddd

Browse files
committed
Update UBDT EN
1 parent 52da2a1 commit c91eddd

File tree

2 files changed

+28
-20
lines changed
  • content
    • english/technical-tools
    • nederlands/technical-tools

2 files changed

+28
-20
lines changed

content/english/technical-tools/BDT.md

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Unsupervised bias detection tool
33
subtitle: >
4-
A statistical tool that identifies groups where an AI system or algorithm shows deviating performance, potentially indicating unfair treatment or bias. The tool informs the qualitative doctrine of law and ethics which disparities need to be scrutinised manually by domain experts.
4+
A statistical tool that identifies groups where an AI system or algorithm shows deviating performance, potentially indicating unfair treatment. The tool informs which disparities need to be examimed manually by domain experts.
55
image: /images/svg-illustrations/illustration_cases.svg
66
quick_navigation:
77
title: Content overview
@@ -78,17 +78,34 @@ team:
7878
type: bias-detection-tool
7979
---
8080

81+
<!-- Promobar -->
82+
83+
<div id={{.Get "id" }} class="container-fluid mt-0 p-0">
84+
<div class="shadow bg-lightblue">
85+
<div class="row promobar-mobile-desktop-layout">
86+
<div class="col-12 flex justify-center items-center px-5">
87+
<!-- Content -->
88+
<span class="mr-3" style="font-size:16px; color:#005aa7;">
89+
<b>👋 Do you also want to start using the tool locally? It's easier than you think! Get in <u><a href="/nl/about/contact/" >contact</a></u> to learn more.</b> </span>
90+
</div>
91+
</div>
92+
</div>
93+
</div>
94+
8195
<!-- Introduction -->
8296

8397
{{< container_open title="Introduction – Unsupervised bias detection tool" icon="fas fa-search " id="info" isAccordion="" >}}
8498

8599
<br>
86100

87101
#### What does the tool do?
88-
The tool helps find groups where an AI system or algorithm performs differently, which could indicate unfair treatment or bias. It does this using a technique called <a href="https://en.wikipedia.org/wiki/Cluster_analysis" target="_blank">clustering</a>, which groups similar data points together (in a cluster). The tool doesn’t need information like gender, nationality, or ethnicity to find these patterns. Instead, it uses a `bias metric` to measure deviations in the performace of the algorithmic system, which you can choose based on your data.
102+
The tool helps find groups where an AI system or algorithm performs differently, which could indicate unfair treatment. It does this using a technique called <a href="https://en.wikipedia.org/wiki/Cluster_analysis" target="_blank">clustering</a>, which groups similar data points together (in a cluster). The tool doesn’t need information like gender, nationality, or ethnicity to find these patterns. Instead, it uses a `bias score` to measure deviations in the performace of the system, which you can choose based on your data.
103+
104+
#### What results does it give?
105+
The tool finds groups (clusters) where performance of the algorithmic system is significantly deviating. It highlights the group with the worst `bias score` and creates a report called a bias analysis report, which you can download as a PDF. You can also download all the identified groups (clusters) in a .json file. Additionally, the tool provides visual summaries of the results, helping experts dive deeper into the identified deviations.
89106

90107
#### What kind of data does it work with?
91-
The tool works with data in a table format, consisting solely of numbers or categories. You just need to pick one column in the data to use as the `bias metric`. This column should have numbers only, and you’ll specify whether a higher or lower number is better. For example, if you’re looking at error rates, lower numbers are better. For accuracy, higher numbers are better. The tool also comes with a demo dataset you can use by clicking "Try it out."
108+
The tool works with data in a table format, consisting solely of numbers or categories. You just need to pick one column in the data to use as the `bias score`. This column should have numbers only, and you’ll specify whether a higher or lower number is better. For example, if you’re looking at error rates, lower numbers are better. For accuracy, higher numbers are better. The tool also comes with a demo dataset you can use by clicking "Try it out."
92109

93110
<div>
94111
<p><u>Example of numerical data set</u>:</p>
@@ -109,9 +126,6 @@ The tool works with data in a table format, consisting solely of numbers or cate
109126
</div>
110127
<br>
111128

112-
#### What results does it give?
113-
The tool finds groups (clusters) where performance of the algorithmic system is significantly different. It highlights the group with the worst performance and creates a report called a bias analysis report, which you can download as a PDF. You can also download all the identified groups in a .json file. Additionally, the tool provides visual summaries of the results, helping experts dive deeper into the identified deviations.
114-
115129
#### Is my data safe?
116130
Yes! Your data stays on your computer and never leaves your organization’s environment. The tool runs directly in your browser, using your computer’s power to analyze the data. This setup, called 'local-first', ensures no data is sent to cloud providers or third parties. Instructions for hosting the tool securely within your organization are available on <a href="https://github.com/NGO-Algorithm-Audit/local-first-web-tool" target="_blank">Github</a>.
117131

@@ -129,14 +143,14 @@ Try the tool below ⬇️
129143
The unsupervised bias detection tool operates a series of steps:
130144

131145
##### Prepared by the user:
132-
<span style="color:#005AA7">1. Dataset:</span> The data must be provided in a tabular format. All columns, except the bias metric column, should have uniform data types, e.g., either all numerical or all categorical. The bias metric column must be numerical. Any missing values should be removed or replaced. The dataset should then be divided into training and testing subset, following a 80-20 ratio.
146+
<span style="color:#005AA7">1. Dataset:</span> The data must be provided in a tabular format. All columns, except the bias score column, should have uniform data types, e.g., either all numerical or all categorical. The bias score column must be numerical. Any missing values should be removed or replaced. The dataset should then be divided into training and testing subset, following a 80-20 ratio.
133147

134-
<span style="color:#005AA7">2. Bias metric:</span> The user selects one column from the dataset to serve as the `bias metric`. In step 3, clustering will be performed based on this chosen `bias metric`. The chosen bias metric must be numerical. Examples include metrics such as "being classified as high risk", "error rate" or "selected for an investigation".
148+
<span style="color:#005AA7">2. bias score:</span> The user selects one column from the dataset to serve as the `bias score`. In step 3, clustering will be performed based on this chosen `bias score`. The chosen bias score must be numerical. Examples include metrics such as "being classified as high risk", "error rate" or "selected for an investigation".
135149

136150
##### Performed by the tool:
137151
<span style="color:#005AA7">3. Hierarchical Bias-Aware Clustering (HBAC):</span> The HBAC algorithm (detailed below) is applied to the training dataset. The centroids of the resulting clusters are saved and later used to assign cluster labels to data points in the test dataset.
138152

139-
<span style="color:#005AA7">4. Testing differences in bias metric:</span> Statistical hypothesis testing is performed to evaluate whether the most deviating cluster contains significantly more bias compared to the rest of the dataset. A two-sample t-test is used to compare the bias metrics between clusters. For multiple hypothesis testing, Bonferonni correction should be applied. Further details can are available in our [scientific paper](/technical-tools/bdt/#scientific-paper).
153+
<span style="color:#005AA7">4. Testing differences in bias score:</span> Statistical hypothesis testing is performed to evaluate whether the most deviating cluster contains significantly more bias compared to the rest of the dataset. A two-sample t-test is used to compare the bias scores between clusters. For multiple hypothesis testing, Bonferonni correction should be applied. Further details can are available in our [scientific paper](/technical-tools/bdt/#scientific-paper).
140154

141155
A schematic overview of the above steps is depicted below.
142156

@@ -145,7 +159,7 @@ A schematic overview of the above steps is depicted below.
145159
</div>
146160

147161
#### How does the clustering algorithm work?
148-
The *Hierarchical Bias-Aware Clustering* (HBAC) algorithm identifies clusters in the provided dataset based on a user-defined `bias metric`. The objective is to find clusters with low variation in the bias metric within each cluster and significant variation between clusters. HBAC iteratively finds clusters in the data using k-means (for numerical data) or k-modes clustering (for categorical data). For the initial split, HBAC takes the full dataset and splits it in two clusters. Cluster `C` – with the highest standard deviation of the bias metric – is selected. Then, cluster `C` is divided into two candidate clusters `C'` and `C''`'. If the average bias metric in either candidate cluster exceed the the average bias metric in `C`, the candidate cluster with highest bias metric is selected as a new cluster. This process repeats until the maximum number of iterations (`max_iterations`) is reached or the resulting cluster fails to meet the minimum size requirement (`n_min`). The pseudo-code of the HBAC algorithm is provided below.
162+
The *Hierarchical Bias-Aware Clustering* (HBAC) algorithm identifies clusters in the provided dataset based on a user-defined `bias score`. The objective is to find clusters with low variation in the bias score within each cluster and significant variation between clusters. HBAC iteratively finds clusters in the data using k-means (for numerical data) or k-modes clustering (for categorical data). For the initial split, HBAC takes the full dataset and splits it in two clusters. Cluster `C` – with the highest standard deviation of the bias score – is selected. Then, cluster `C` is divided into two candidate clusters `C'` and `C''`'. If the average bias score in either candidate cluster exceed the the average bias score in `C`, the candidate cluster with highest bias score is selected as a new cluster. This process repeats until the maximum number of iterations (`max_iterations`) is reached or the resulting cluster fails to meet the minimum size requirement (`n_min`). The pseudo-code of the HBAC algorithm is provided below.
149163

150164
<div style="display: flex; justify-content: center;">
151165
<img src="/images/BDT/pseudo_code_HBAC.png" alt="drawing" width="800px"/>
@@ -154,7 +168,7 @@ The *Hierarchical Bias-Aware Clustering* (HBAC) algorithm identifies clusters in
154168
The HBAC-algorithm is introduced by Misztal-Radecka and Indurkya in a [scientific article](https://www.sciencedirect.com/science/article/abs/pii/S0306457321000285) as published in *Information Processing and Management* in 2021. Our implementation of the HBAC-algorithm advances this implementation by proposing additional methodological checks to distinguish real bias from noise, such as sample splitting, statistical hypothesis testing and measuring cluster stability. Algorithm Audit's implementation of the algorithm can be found in the <a href="https://github.com/NGO-Algorithm-Audit/unsupervised-bias-detection/blob/master/README.md" target="_blank">unsupervised-bias-detection</a> pip package.
155169

156170
#### How should the results of the tool be interpreted?
157-
The HBAC algorithm maximizes the difference in the bias metric between clusters. To prevent incorrect conclusions that there is bias in the decision-making process under review when there truly is none, we split the dataset in training and test data, and hypothesis testing prevents us from (wrongly) concluding that there is a difference in the bias metric while there is none. If statistically significant bias is detected, the outcome of the tool serves as a starting point for human experts to assess potential discrimination in the decision-making processes.
171+
The HBAC algorithm maximizes the difference in the bias score between clusters. To prevent incorrect conclusions that there is bias in the decision-making process under review when there truly is none, we split the dataset in training and test data, and hypothesis testing prevents us from (wrongly) concluding that there is a difference in the bias score while there is none. If statistically significant bias is detected, the outcome of the tool serves as a starting point for human experts to assess potential discrimination in the decision-making processes.
158172

159173
{{< container_close >}}
160174

content/nederlands/technical-tools/BDT.md

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
11
---
22
title: Unsupervised bias detectie tool
33
subtitle: >
4-
Local-first tool die gebruik maakt van statistiek om mogelijk ongelijk
5-
behandelde groepen door algoritmes of AI te identificeren. De tool informeert de
6-
kwalitatieve doctrine van het recht en de ethiek welke afwijkingen onderzocht moeten worden door domeinexperts.
4+
Tool die mogelijk ongelijk behandelde groepen door algoritmes of AI identificeert zonder daarvoor gebruik te maken van bijzondere persoonsgegevens, zoals geslacht of afkomst. De tool informeert welke afwijkingen onderzocht moeten worden door domeinexperts.
75
86
image: /images/svg-illustrations/illustration_cases.svg
97
type: bias-detection-tool
@@ -77,19 +75,17 @@ quick_navigation:
7775
url: '#team'
7876
---
7977

80-
81-
8278
<!-- Introductie -->
8379

8480
{{< container_open title="Introductie – Unsupervised bias detectie tool" icon="fas fa-search" id="info" >}}
8581

8682
<br>
8783

8884
#### Wat doet de tool?
89-
De tool detecteert groepen waarvoor een algoritme of AI-systeem afwijkend presteert. Naar deze vorm van monitoring wordt verwezen als *anomaliedetectie*. Voor het detecteren van afwijkende partonen maakt de tool gebruik van <a href="https://en.wikipedia.org/wiki/Cluster_analysis" target="_blank">clustering</a>. Clustering is een vorm van _unsupervised learning_. Dit betekent dat er geen gegevens nodig zijn over beschermde kenmerken van gebruikers, zoals geslacht, nationaliteit of etniciteit, om verdacht onderscheid (bias) te detecteren. De metriek aan de hand waarvan onderscheid wordt bepaald kan handmatig worden gekozen en wordt naar verwezen als de `gelijkheidsmetriek`.
85+
De tool detecteert groepen waarvoor een algoritme of AI-systeem afwijkend presteert. Naar deze vorm van monitoring wordt verwezen als *anomaliedetectie*. Voor het detecteren van afwijkende partonen maakt de tool gebruik van <a href="https://en.wikipedia.org/wiki/Cluster_analysis" target="_blank">clustering</a>. Clustering is een vorm van _unsupervised learning_. Dit betekent dat er geen gegevens nodig zijn over beschermde kenmerken van gebruikers, zoals geslacht, nationaliteit of etniciteit, om verdacht onderscheid (bias) te detecteren. De metriek aan de hand waarvan onderscheid wordt bepaald kan handmatig worden gekozen en wordt naar verwezen als de `bias score`.
9086

9187
#### Welke data kan worden verwerkt?
92-
De tool verwerkt alle data in tabel-vorm. Het type data (numerieke, categorische, tijden etc.) wordt automatisch gedetecteerd. Eén kolom moet geselecteerd worden als de `gelijkheidsmetriek`, welke een numerieke waarde moet zijn. De gebruiker dient aan te aangeven of een hoge of lage waarde van de `gelijkheidsmetriek` beter is. Voorbeeld: als de `gelijkheidsmetriek` een foutpercentage betreft dan is een lage waarde beter, terwijl bij nauwkeurigheid een hoge waarde beter is.
88+
De tool verwerkt alle data in tabel-vorm. Het type data (numerieke, categorische, tijden etc.) wordt automatisch gedetecteerd. Eén kolom moet geselecteerd worden als de `bias score`, welke een numerieke waarde moet zijn. De gebruiker dient aan te aangeven of een hoge of lage waarde van de `bias score` beter is. Voorbeeld: als de `bias score` een foutpercentage betreft dan is een lage waarde beter, terwijl bij nauwkeurigheid een hoge waarde beter is.
9389

9490
<div>
9591
<p><u>Voorbeeld van numerieke dataset</u>:</p>
@@ -126,13 +122,11 @@ Gebruik de tool hier beneden ⬇️
126122
{{< container_close >}}
127123

128124

129-
130125
<!-- Web app -->
131126

132127
{{< iframe title="Web app – Unsupervised bias detectie tool" icon="fas fa-cloud" id="web-app" src="https://local-first-bias-detection.s3.eu-central-1.amazonaws.com/bias-detection.html?lang=nl" height="770px" >}}
133128

134129

135-
136130
<!-- Promobar -->
137131

138132
{{< promo_bar content="Waardeer je het werk van Algorithm Audit? ⭐️ ons op" id="promo" >}}

0 commit comments

Comments
 (0)