Skip to content

Commit 0e339bf

Browse files
committed
SDG content update EN NL
1 parent 01600f1 commit 0e339bf

File tree

2 files changed

+32
-26
lines changed
  • content
    • english/technical-tools
    • nederlands/technical-tools

2 files changed

+32
-26
lines changed

content/english/technical-tools/SDG.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
type: regular
33
title: Synthetic data generation tool
44
subtitle: >
5-
Local-first tool to generate tabular synthetic data. The tool automatically generates an evaluation report to assess the quality of the data. All data are locally processed without using cloud solutions.
5+
Local-first tool to generate synthetic data. The tool automatically generates an evaluation report to assess the quality of the generated data. All data are locally processed without using cloud solutions.
66
image: /images/svg-illustrations/knowledge_base.svg
77
team:
88
title: Synthetic data generation team
@@ -61,10 +61,10 @@ quick_navigation:
6161
Synthetic data is artificial data mimicking the original dataset's statistical characteristics without sharing personal data.
6262

6363
#### What data can be processed?
64-
The tool processes all data in table format. The type of data (numerical, categorical, time, etc.) and missing values are automatically detected. The user has several option how missing values can be processed. More info is provided in the tool.
64+
The tool processes all data in table format. The type of data (numerical, categorical, time, etc.) and missing values are automatically detected. The user has several option how missing values can be processed. More info how missing values can be treated is provided in the tool.
6565

6666
#### What synthetic data generation methods are supported?
67-
Users can currently choose two methods for synthetic data generation:
67+
Users can currently choose two methods for generating synthetic data:
6868
1. Classification And Regression Trees (CART); and
6969
2. Gaussian Copula (GC).
7070

@@ -74,7 +74,7 @@ By default, CART is used. CART generally produces higher quality synthetic data,
7474
The tool generates synthetic data. An evaluation report of the generated data, including various evaluation metrics, is automatically created and can be downloaded as a pdf. The synthetic data can be downloaded in .csv and .json format.
7575

7676
#### How is my data processed?
77-
The tool is privacy-friendly because the data is processed entirely within the browser. The data does not leave your computer or the environment of your organization. The tool utilizes the computing power of your own computer to analyze the data. This type of browser-based software is referred to as [*local-first*](/technical-tools/sdg/#local-first). The tool does not upload data to third parties, such as cloud providers. Instructions on how the tool and the local-first architecture can be hosted locally within your own organization can be found on <a href="https://github.com/NGO-Algorithm-Audit/local-first-web-tool" target="_blank">Github</a>.
77+
The tool is privacy-friendly because the data are processed entirely within the browser. The data does not leave your computer or the environment of your organization. The tool utilizes the computing power of your own computer to analyze the data. This type of browser-based software is referred to as [*local-first*](/technical-tools/sdg/#local-first). The tool does not upload data to third parties, such as cloud providers. Instructions on how the tool and the local-first architecture can be hosted locally within your own organization can be found on <a href="https://github.com/NGO-Algorithm-Audit/local-first-web-tool" target="_blank">Github</a>.
7878

7979
Try the tool below ⬇️
8080

@@ -125,15 +125,17 @@ Synthetic data generation (SDG) offers a solution. By creating artificial data t
125125

126126
{{< container_open title="Has SDG been used in the past?" icon="fas fa-history" id="use-cases" >}}
127127

128-
Widespread adoption of synthetic data generation has long been hindered by privacy concerns related to data sharing. Many commercial APIs depend on cloud-based software, making them unsuitable for public sector organizations, where citizen data cannot easily be shared externally. A [local-first](/technical-tools/sdg/#local-first) approach offers a viable solution, enabling synthetic data to be generated within an organisation using browsed-based software. Besides, recent years have showcased groundbreaking use cases that highlight how SDG can enable secure data sharing while ensuring privacy.
128+
For two reasons, the use of synthetic data has long been hindered:
129+
1. - <span style="color:#005AA7">Privacy risks</span> – Concerns, particularly among legal professionals, existed about the risks of personal data being exposed when sharing synthetic data. Research and practical examples have demonstrated that these risks can be mitigated. See the attached [memo](/technical-tools/sdg/#privacy-legal) below for more background information on the legal aspects of synthetic data generation.
130+
2. <span style="color:#005AA7">Cloud dependencies risks</span> – Many existing (commercial) APIs rely on cloud-based software, making them unsuitable for public organizations, as citizen data cannot simply be uploaded to cloud platforms. [Local-first](/technical-tools/sdg/#local-first) data processing offers a solution to this problem. With this tool, synthetic data can be generated directly in the browser. The data does not leave the user's computer or the organization's environment.
129131

130-
#### Use cases
132+
In sum, recent use cases have shown that synthetic data can be safely shared and generated without the involvement of a cloud provider. It is time to scale up so that stakeholders can gain more and better insights into the data managed by government organizations.
131133

132-
Notably, <a href="https://www.lighthousereports.com/suspicion-machines-methodology/" target="_blank">Lighthouse Reports</a> shared inadvertently acquired data to the public through SDG, shedding light on biases in a massive data set that the Municipality of Rotterdam used for ML-driven risk profiling in the context of social welfare re-examination.
134+
#### Applications
135+
<a href="https://www.lighthousereports.com/suspicion-machines-methodology/" target="_blank">Lighthouse Reports</a> was able to publicly share unintentionally obtained data using synthetic data, revealing bias in a dataset from the Municipality of Rotterdam. This dataset was used for machine learning-driven risk profiling in the context of social welfare re-examination.
133136

134137
#### AI Act
135-
136-
Furthermore, Article 10(5) of the AI Act contains a specific provision on the use of synthetic data for bias detection and mitigation. It requires AI system providers to address biases by utilizing synthetic or anonymized data first, rather than drirectly "processing special categories of personal data".
138+
Additionally, Article 10(5) of the AI Act includes a specific provision regarding the use of synthetic data for bias detection and mitigation. It requires AI system providers to first investigate bias using synthetic or anonymized data, rather than directly processing "special categories of personal data."
137139

138140
{{< container_close >}}
139141

@@ -158,7 +160,7 @@ To be translated to English
158160
<br>
159161

160162
#### What is local-first computing?
161-
Local-first computing is the opposite of cloud computing: the data is not uploaded to third-parties, such as a cloud providers, and is processed by your own computer. The data attached to the tool therefore doesn't leave your computer or the environment of your organization. The tool is privacy-friendly because the data can be processed within the mandate of your organisation and doesn't need to be shared with new parties. The unsupervised bias detection tool can also be hosted locally within your organization. Instructions, including the source code or the web app, can be found on <a href="https://github.com/NGO-Algorithm-Audit/local-first-web-tool" target="_blank">Github</a>.
163+
Local-first computing is the opposite of cloud computing: the data are not uploaded to third-parties, such as a cloud providers, and are processed by your own computer. The data attached to the tool therefore don't leave your computer or the environment of your organization. The tool is privacy-friendly because the data can be processed within the mandate of your organisation and don't need to be shared with new parties. This synthetic data generation tool can also be hosted locally within your organization. Instructions for local hosting, including the source code or the web app, can be found on <a href="https://github.com/NGO-Algorithm-Audit/local-first-web-tool" target="_blank">Github</a>.
162164

163165
#### Overview of local-first architecture
164166

@@ -174,14 +176,14 @@ Local-first computing is the opposite of cloud computing: the data is not upload
174176

175177
{{< container_open title="Supported by" icon="fas fa-toolbox" id="supported-by">}}
176178

177-
This tool is developed with support of public and philanthropic organisations.
179+
This local-first synthetic data generation tool is developed with support of public and philanthropic organisations.
178180

179181
{{< accordions_area_open>}}
180182

181183
{{< accordion_item_open title="Innovation grant Dutch Ministry of the Interior" image="/images/supported_by/BZK.jpg" tag1="2024-25" >}}
182184

183185
##### Description
184-
In partnership with the Dutch Executive Agency for Education and the Dutch Ministry of the Interior, Algorithm Audit has been developing and testing this tool from July 2024 to July 2025, supported by an <a href="https://www.digitaleoverheid.nl/overzicht-van-alle-onderwerpen/innovatie/innovatiebudget/toekenning-innovatiebudget-2024/" target="_blank">Innovation grant</a> from the annual competition hosted by the Dutch Ministry of the Interior. Project progress was shared at a community gathering on 13-02-2025.
186+
In partnership with the Dutch Executive Agency for Education and the Dutch Ministry of the Interior, Algorithm Audit has been developing and testing this tool from July 2024 to July 2025, supported by an <a href="https://www.digitaleoverheid.nl/overzicht-van-alle-onderwerpen/innovatie/innovatiebudget/toekenning-innovatiebudget-2024/" target="_blank">Innovation grant</a> from the annual competition hosted by the Dutch Ministry of the Interior. Project progress was shared at a community gathering on 13-02-2025. A first version of the tools are launched during a webinar on 10-06-2025.
185187

186188
![](/images/events/20250213_Demodag2025.jpg)
187189

@@ -191,7 +193,7 @@ In partnership with the Dutch Executive Agency for Education and the Dutch Minis
191193

192194
##### Description
193195

194-
In 2024, the SIDN Fund <a href="https://www.sidnfonds.nl/projecten/open-source-ai-auditing" target="_blank">supported</a> Algorithm Audit to develop a first demo of the unsupervised bias detection tool.
196+
In 2024, the SIDN Fund <a href="https://www.sidnfonds.nl/projecten/open-source-ai-auditing" target="_blank">supported</a> Algorithm Audit to develop a first demo of the synthetic data generation tool.
195197

196198
{{< accordion_item_close >}}
197199

0 commit comments

Comments
 (0)