You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/english/technical-tools/BDT.md
+31-10Lines changed: 31 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,25 +66,46 @@ type: bias-detection-tool
66
66
67
67
#### What is the tool about?
68
68
69
-
The tool identifies potentially unfairly treated groups of similar users by an AI system. The tool returns clusters of users for which the system is underperforming compared to the rest of the data set. The tool makes use of <ahref="https://en.wikipedia.org/wiki/Cluster_analysis"target="_blank">clustering</a> – an unsupervised statistal learning method. This means that no data is required on protected attributes of users, e.g., gender, nationality or ethnicity, to detect indirect discrimination, also referred to as higher-dimensional proxy or intersectional discrimination. The metric by which bias is defined can be manually chosen and is referred to as the `performance metric`.
69
+
The tool identifies potentially unfairly treated groups of similar users by an AI system. The tool returns clusters of users for which the system is underperforming compared to the rest of the data set. The tool makes use of <ahref="https://en.wikipedia.org/wiki/Cluster_analysis"target="_blank">clustering</a> – an unsupervised statistal learning method. This means that no data are required on protected attributes of users, e.g., gender, nationality or ethnicity, to detect indirect discrimination, also referred to as higher-dimensional proxy or intersectional discrimination. The metric by which bias is defined can be manually chosen and is referred to as the `performance metric`.
70
70
71
-
#### How is my data processed?
72
-
73
-
The tool is privacy preserving. It uses computing power of your own computer to analyze the attached data set. In this architectural setup, data is processed entirely on your device and it not uploaded to any third-party, such as cloud providers. This computing approach is called *local-first* and allows organisations to securely use tools locally.
71
+
#### What data can be processed?
74
72
75
-
The used software is also available as <ahref="https://pypi.org/project/unsupervised-bias-detection/"target="_blank">pip package</a> `unsupervised-bias-detection`.[](https://pypi.org/project/unsupervised-bias-detection/)
73
+
Numerical and categorical data can be analysed. The type of data is automatically detected by the tool. The `performance metric` column should always contain numerical values. The user should indicate in the app whether a higher of lower value of the `performance metric` is considered to be better.
74
+
75
+
The tool contains a demo data set and a 'Try it out' button. More information can be found in the app.
Numerical and categorical data setcan be analysed. The type of data is automatically detected.
97
+
The tool is privacy preserving. It uses computing power of your own computer to analyze the attached data set. In this architectural setup, data is processed entirely on your device and it not uploaded to any third-party, such as cloud providers. This computing approach is called *local-first* and allows organisations to securely use tools locally. Instructions how the tool can be hosted locally, incl. source code, can be found <ahref="https://github.com/NGO-Algorithm-Audit/local-first-web-tool"target="_blank">here</a>.
Software of the used statistical methods is available in a seperate <ahref="https://github.com/NGO-Algorithm-Audit/unsupervised-bias-detection"target="_blank">Github repository</a>, and is also available as <ahref="https://pypi.org/project/unsupervised-bias-detection/"target="_blank">pip package</a> `unsupervised-bias-detection`.
82
101
83
102
#### What does the tool return?
84
103
85
-
The tool returns a report which presents the cluster with the highest bias and describes this cluster by the features that characterizes it. This is quantitatively expressed by the (statistically significant) differences in feature means between the identified cluster and the rest of the data. These results serve as a starting point for a deliberative assessment by human experts to evaluate potential discrimination and unfairness in the AI system under review. The tool also visualizes the outcomes.
104
+
The tool returns a pdf report or `.json` file with identified clusters. It specifically focusses on the identified cluster with highest bias and describes this cluster by the features that characterizes it. These results serve as a starting point for a deliberative assessment by human experts to evaluate potential discrimination and unfairness in the AI system under review. The tool also visualizes the outcomes.
86
105
87
106
Try the tool below ⬇️
107
+
<!-- This is quantitatively expressed by the (statistically significant) differences in feature means between the identified cluster and the rest of the data. -->
108
+
88
109
89
110
{{< container_close >}}
90
111
@@ -112,7 +133,7 @@ Algorithm Audit's bias detection tool is part of OECD's <a href="https://oecd.ai
The bias detection tool currently works for tabular numerical and categorical data. The _Hierarchical Bias-Aware Clustering_ (HBAC) algorithmprocesses input data according to the k-means or k-modes clustering algorithm. The HBAC-algorithm is introduced by Misztal-Radecka and Indurkya in a [scientific article](https://www.sciencedirect.com/science/article/abs/pii/S0306457321000285) as published in *Information Processing and Management* (2021). Our implementation of the HBAC-algorithm can be found on <ahref="https://github.com/NGO-Algorithm-Audit/unsupervised-bias-detection/blob/master/README.md"target="_blank">Github</a>.
136
+
The bias detection tool utilizes the _Hierarchical Bias-Aware Clustering_ (HBAC) algorithm. HBAC processes input data according to the k-means (for numerical data) or k-modes (for categorical data) clustering algorithm. The HBAC-algorithm is introduced by Misztal-Radecka and Indurkya in a [scientific article](https://www.sciencedirect.com/science/article/abs/pii/S0306457321000285) as published in *Information Processing and Management* (2021). Our implementation of the HBAC-algorithm can be found on <ahref="https://github.com/NGO-Algorithm-Audit/unsupervised-bias-detection/blob/master/README.md"target="_blank">Github</a>. The methodology has been reviewed by a team of machine learning engineers and statisticians, and is continuously undergoing evaluation.
0 commit comments