Skip to content

Commit 4558ed0

Browse files
authored
Update multi-party-data.md
1 parent e02c754 commit 4558ed0

File tree

1 file changed

+22
-21
lines changed

1 file changed

+22
-21
lines changed

articles/confidential-computing/multi-party-data.md

Lines changed: 22 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -24,65 +24,66 @@ ms.author: grbury
2424

2525
# Cleanroom and Multi-party Data Analytics
2626

27-
Azure confidential computing (ACC) provides a foundation for solutions that enable multiple parties to collaborate on data. There are various approaches to solutions, as well as a growing ecosystem of partners to help enable Azure customers, researchers, data scientists and data providers to collaborate on data while preserving privacy. This article will overview some of the approaches and existing solutions that can be leveraged, all running on ACC.
27+
Azure confidential computing (ACC) provides a foundation for solutions that enable multiple parties to collaborate on data. There are various approaches to solutions, and a growing ecosystem of partners to help enable Azure customers, researchers, data scientists and data providers to collaborate on data while preserving privacy. This article overviews some of the approaches and existing solutions that can be used, all running on ACC.
2828

2929
## What are the data and model protections?
3030

31-
Data cleanroom solutions typically offer a means for one or more data providers to combine data for processing on agreed upon code, queries, or models which are created by one of the providers or another participant, such as a researcher or solution provider. In many cases, the data can be considered sensitive and undesired to directly share to other participants – whether another data provider, a researcher, or solution vendor. To help ensure security and privacy on both the data and models used within data cleanrooms, confidential computing can be used to cryptographically verify that participants don't have access to the data or models, including during processing. With leveraging ACC, the solutions can bring protections on the data and model IP from the cloud operator, solution provider, and data collaboration participants.
31+
Data cleanroom solutions typically offer a means for one or more data providers to combine data for processing. There is typically agreed upon code, queries, or models that are created by one of the providers or another participant, such as a researcher or solution provider. In many cases, the data can be considered sensitive and undesired to directly share to other participants – whether another data provider, a researcher, or solution vendor. To help ensure security and privacy on both the data and models used within data cleanrooms, confidential computing can be used to cryptographically verify that participants don't have access to the data or models, including during processing. By using ACC, the solutions can bring protections on the data and model IP from the cloud operator, solution provider, and data collaboration participants.
3232

3333
## What are examples of industry use cases?
3434

3535
With ACC, customers and partners build privacy preserving multi-party data analytics solutions, sometimes referred to as "confidential cleanrooms" – both net new solutions uniquely confidential, and existing cleanroom solutions made confidential with ACC.
3636

37-
1. **Royal Bank of Canada** - [Virtual clean room](https://aka.ms/RBCstory) solution combining merchant data with bank data in order to provide personalized offers, leveraging Azure confidential computing VMs and Azure SQL AE in secure enclaves.
38-
2. **Scotiabank** – Proved the use of AI on cross-bank money flows to identify money laundering to flag human trafficking instances, leveraging Azure confidential computing and a solution partner, Opaque.
39-
3. **Novartis Biome**leveraged a partner solution from [BeeKeeperAI](https://aka.ms/ACC-BeeKeeperAI) running on ACC in order to find candidates for clinical trials for rare diseases.
40-
4. **Leading payment providers** connecting data across banks for fraud and anomaly detection
41-
5. **Data analytic services** and clean room solutions leveraging ACC to increase data protection and meet EU customer compliance needs and privacy regulation.
37+
1. **Royal Bank of Canada** - [Virtual clean room](https://aka.ms/RBCstory) solution combining merchant data with bank data in order to provide personalized offers, using Azure confidential computing VMs and Azure SQL AE in secure enclaves.
38+
2. **Scotiabank** – Proved the use of AI on cross-bank money flows to identify money laundering to flag human trafficking instances, using Azure confidential computing and a solution partner, Opaque.
39+
3. **Novartis Biome**used a partner solution from [BeeKeeperAI](https://aka.ms/ACC-BeeKeeperAI) running on ACC in order to find candidates for clinical trials for rare diseases.
40+
4. **Leading payment providers** connecting data across banks for fraud and anomaly detection.
41+
5. **Data analytic services** and clean room solutions using ACC to increase data protection and meet EU customer compliance needs and privacy regulation.
4242

4343

4444
## Why confidential computing?
4545

46-
Data cleanrooms are not a brand-new concept, however with advances in confidential computing, there are more opportunities for leveraging cloud scale with broader datasets, securing IP of AI models, as well as ability to better meet data privacy regulations. In previous cases, certain data might be inaccessible for reasons such as
46+
Data cleanrooms aren't a brand-new concept, however with advances in confidential computing, there are more opportunities to take advantage of cloud scale with broader datasets, securing IP of AI models, and ability to better meet data privacy regulations. In previous cases, certain data might be inaccessible for reasons such as
4747

4848
- Competitive disadvantages or regulation preventing of sharing data across industry companies.
4949
- Anonymization reducing the quality of insights on data, or being too costly and time consuming.
5050
- Data being bound to certain locations and refrained from processing in the cloud due to security concerns.
5151
- Costly or lengthy legal processes cover liability if data is exposed or abused
5252

53-
These realities could lead to incomplete or ineffective datasets that result in weaker insights, or additional time needed in training and leveraging AI models.
53+
These realities could lead to incomplete or ineffective datasets that result in weaker insights, or more time needed in training and using AI models.
5454

55-
## What are considerations when build a cleanroom solution?
55+
## What are considerations when building a cleanroom solution?
5656

57-
_Batch analytics vs. real-time data pipelines:_ The size of the datasets and speed of insights should be considered when designing or leveraging a cleanroom solution. When data is available "offline", it can be loaded into a verified and secured compute environment for data analytic processing on large portions of data, if not the entire dataset. This allows for large datasets to be evaluated with models and algorithms that are not expected to provide an immediate result, for example running inference across millions of health records to find best candidates for a clinical trial. Other solutions require real-time insights on data, such as when algorithms and models aim to identify fraud on near real-time transactions between multiple entities.
57+
_Batch analytics vs. real-time data pipelines:_ The size of the datasets and speed of insights should be considered when designing or using a cleanroom solution. When data is available "offline", it can be loaded into a verified and secured compute environment for data analytic processing on large portions of data, if not the entire dataset. This batch analytics allow for large datasets to be evaluated with models and algorithms that aren't expected to provide an immediate result. For example, batch analytics work well when doing ML inferencing across millions of health records to find best candidates for a clinical trial. Other solutions require real-time insights on data, such as when algorithms and models aim to identify fraud on near real-time transactions between multiple entities.
5858

59-
_Zero-trust participation:_ A major differentiator in confidential cleanrooms is the ability to have no party involved trusted – from all data providers, code and model developers, solution providers and infrastructure operator admins. Solutions can be provided where both the data and model IP can be protected from all parties. In onboarding or building a solution, participants should consider both what is desired to protect, and from whom to protect each of the code, models, and data.
59+
_Zero-trust participation:_ A major differentiator in confidential cleanrooms is the ability to have no party involved trusted – from all data providers, code and model developers, solution providers and infrastructure operator admins. Solutions can be provided where both the data and model IP can be protected from all parties. When onboarding or building a solution, participants should consider both what is desired to protect, and from whom to protect each of the code, models, and data.
6060

61-
_Federated learning:_ Federated learning involves creating or leveraging a solution whereas models are provided to process in the data owners tenant, and insights are aggregated in a central tenant. In some cases, the models can even be run on data outside of Azure, with model aggregation still occurring in Azure. Many times, federated learning involves iterating on data many times as the parameters of the model improves based on the aggregated insights, so the iteration costs and quality of the model should be factored into the operation and expected outcomes.
61+
_Federated learning:_ Federated learning involves creating or using a solution whereas models process in the data owner's tenant, and insights are aggregated in a central tenant. In some cases, the models can even be run on data outside of Azure, with model aggregation still occurring in Azure. Many times, federated learning iterates on data many times as the parameters of the model improve after insights are aggregated. The iteration costs and quality of the model should be factored into the solution and expected outcomes.
6262

63-
_Data residency and sources:_ Customer have data stored in multiple clouds and on-premises. Collaboration can include data and models from different sources. Cleanroom solutions can facilitate data and models coming to Azure from these other locations. When data can't be moved to Azure from an on-premise data store, some cleanroom solutions can be provided on site where the data resides, with management and policies powered by a common solution provider, where available.
63+
_Data residency and sources:_ Customers have data stored in multiple clouds and on-premises. Collaboration can include data and models from different sources. Cleanroom solutions can facilitate data and models coming to Azure from these other locations. When data cannot move to Azure from an on-premises data store, some cleanroom solutions can run on site where the data resides. Management and policies can be powered by a common solution provider, where available.
6464

65-
_Code integrity and confidential ledgers:_ With distributed ledger technology (DLT) running on Azure confidential computing, solutions can be built that run on a network across organizations, whereas the code logic and analytic rules can be added only when there is concensus across the various participants. All updates to the code are recorded for auditing via tamper-proof logging enabled with Azure confidential computing.
65+
_Code integrity and confidential ledgers:_ With distributed ledger technology (DLT) running on Azure confidential computing, solutions can be built that run on a network across organizations. The code logic and analytic rules can be added only when there's concensus across the various participants. All updates to the code are recorded for auditing via tamper-proof logging enabled with Azure confidential computing.
6666

6767
## What are options to get started?
6868

6969
### ACC platform offerings that help enable confidential cleanrooms
70-
Roll-up your sleeves and build a data clean room solution directly on these confidenital computing service offerings.
70+
Roll up your sleeves and build a data clean room solution directly on these confidential computing service offerings.
7171

7272
[Confidential containers](https://learn.microsoft.com/en-us/azure/confidential-computing/confidential-containers) on Azure Container Instances (ACI) and Intel SGX VMs with application enclaves provide a container solution for building confidential cleanroom solutions.
7373

7474
[Confidential Virtual Machines (VMs)](https://learn.microsoft.com/en-us/azure/confidential-computing/confidential-vm-overview) provide a VM platform for confidential cleanroom solutions.
7575

76-
[Azure SQL AE in secure enclaves](https://learn.microsoft.com/en-us/sql/relational-databases/security/encryption/always-encrypted-enclaves) provides a platform service for encrypting data and queries in SQL that can be used in multi-party data analytics and confidenital cleanrooms.
76+
[Azure SQL AE in secure enclaves](https://learn.microsoft.com/en-us/sql/relational-databases/security/encryption/always-encrypted-enclaves) provides a platform service for encrypting data and queries in SQL that can be used in multi-party data analytics and confidential cleanrooms.
7777

78-
[Confidential Consortium Framework](https://ccf.microsoft.com/) is an open-source framework for building highly available stateful services that leverage centralized compute for ease of use and performance, while providing decentralized trust. It enables multiple parties to execute auditable compute over confidential data without trusting each other or a privileged operator.
78+
[Confidential Consortium Framework](https://ccf.microsoft.com/) is an open-source framework for building highly available stateful services that use centralized compute for ease of use and performance, while providing decentralized trust. It enables multiple parties to execute auditable compute over confidential data without trusting each other or a privileged operator.
7979

8080
### ACC partner solutions that enable confidential cleanrooms
81-
Leverage a partner that has built a multi-party data analytics solution on top of the Azure confidential computing platform.
81+
Use a partner that has built a multi-party data analytics solution on top of the Azure confidential computing platform.
8282

83-
- [**Anjuna**](https://www.anjuna.io/use-case-solutions) provides a confidential computing platform to enable various use caes, including secure clean rooms, for organizations to share data for joint analysis, such as calculating credit risk scores or developing machine learning models, without exposing sensitive information.- - [**BeeKeeperAI**](https://www.beekeeperai.com/) enables healthcare AI through a secure collaboration platform for algorithm owners and data stewards. BeeKeeperAI™ uses privacy-preserving analytics on multi-institutional sources of protected data in a confidential computing environment including end-to-end encryption, secure computing enclaves, and Intel's latest SGX enabled processors to comprehensively protect the data and the algorithm IP.
83+
- [**Anjuna**](https://www.anjuna.io/use-case-solutions) provides a confidential computing platform to enable various use cases, including secure clean rooms, for organizations to share data for joint analysis, such as calculating credit risk scores or developing machine learning models, without exposing sensitive information.
84+
- [**BeeKeeperAI**](https://www.beekeeperai.com/) enables healthcare AI through a secure collaboration platform for algorithm owners and data stewards. BeeKeeperAI™ uses privacy-preserving analytics on multi-institutional sources of protected data in a confidential computing environment. The solution supports end-to-end encryption, secure computing enclaves, and Intel's latest SGX enabled processors to protect the data and the algorithm IP.
8485
- [**Decentriq**](https://www.decentriq.com/) provides Software as a Service (SaaS) data clean rooms to enable companies to collaborate with other organizations on their most sensitive datasets and create value for their clients. The technologies help prevent anyone to see the sensitive data, including Decentriq.
8586
- [**Fortanix**](https://www.fortanix.com/platform/confidential-ai) provides a confidential computing platform that can enable confidential AI, including multiple organizations collaborating together for multi-party analytics.
86-
- [**Mithril Security**](https://www.mithrilsecurity.io/) provides tooling to help SaaS vendors serve AI models inside secure enclaves, and providing an on-premise level of security and control to data owners. Data owners can use their SaaS AI solutions while remaining compliant and in control of their data.
87+
- [**Mithril Security**](https://www.mithrilsecurity.io/) provides tooling to help SaaS vendors serve AI models inside secure enclaves, and providing an on-premises level of security and control to data owners. Data owners can use their SaaS AI solutions while remaining compliant and in control of their data.
8788
- [**Opaque**](https://opaque.co/) provides a confidential computing platform for collaborative analytics and AI, giving the ability to perform collaborative scalable analytics while protecting data end-to-end and enabling organizations to comply with legal and regulatory mandates.
8889
- [**SafeLiShare**](https://safelishare.com/solution/encrypted-data-clean-room/) provides policy-driven encrypted data clean rooms where access to data is auditable, trackable, and visible, while keeping data protected during multi-party data sharing.

0 commit comments

Comments
 (0)