Add documentation of removed EU regulations and standards

realmarcin · claude · realmarcin · commit 4ccbdc4df8bb · 2025-12-17T20:08:42.000-08:00
Document all EU regulatory content removed from D4D schema in commit 4fc1f85 (Dec 2, 2025) based on Harry Caufield's recommendation to "stay US-centric". This file provides a complete reference of what was removed, with context showing 2 lines before and after each removal for easy understanding. Content removed: - GDPR (General Data Protection Regulation) - 5 references - EU AI Act (Regulation (EU) 2024/1689) - 3 references - Complete AIActRiskEnum with 4 risk categories (42 lines) - gdpr_compliant and eu_ai_act_risk_category fields - CSVW and Frictionless Data prefixes and mappings Total: 12 distinct removals across 4 schema files (~57 lines) Impact: Schema now focuses exclusively on US regulations (HIPAA, 45 CFR 46) and reduced aligned standards from 40+ to 25+. See: notes/eu_regulations_removed_content.txt for full details 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
diff --git a/notes/eu_regulations_removed_content.txt b/notes/eu_regulations_removed_content.txt
@@ -0,0 +1,285 @@
+================================================================================
+EU REGULATIONS AND STANDARDS CONTENT REMOVED FROM D4D SCHEMA
+================================================================================
+
+Commit: 4fc1f852746590ac76803d829d440c46f8782789
+Author: marcin p. joachimiak <4625870+realmarcin@users.noreply.github.com>
+Date:   Tue Dec 2 19:31:12 2025 -0800
+Message: Complete Harry's feedback: Remove GDPR/EU AI Act, finish Frictionless/CSVW cleanup
+
+Rationale: Harry Caufield's recommendation to "stay US-centric" - removed all
+EU regulatory framework references to focus on US regulations (HIPAA, 45 CFR 46).
+
+================================================================================
+FILE: src/data_sheets_schema/schema/D4D_Base_import.yaml
+================================================================================
+
+REMOVAL 1: CSVW Prefix
+--------------------------------------------------------------------------------
+BEFORE (2 lines):
+prefixes:
+  biolink: https://w3id.org/biolink/vocab/
+
+REMOVED:
+  csvw: http://www.w3.org/ns/csvw#
+
+AFTER (2 lines):
+  data_sheets_schema: https://w3id.org/bridge2ai/data-sheets-schema/
+  datasets: https://w3id.org/linkml/report
+--------------------------------------------------------------------------------
+
+REMOVAL 2: Frictionless Prefix
+--------------------------------------------------------------------------------
+BEFORE (2 lines):
+  dcterms: http://purl.org/dc/terms/
+  example: https://example.org/
+
+REMOVED:
+  frictionless: https://specs.frictionlessdata.io/
+
+AFTER (2 lines):
+  linkml: https://w3id.org/linkml/
+  mediatypes: https://www.iana.org/assignments/media-types/
+--------------------------------------------------------------------------------
+
+REMOVAL 3: GDPR Reference in Composition Subset Description
+--------------------------------------------------------------------------------
+BEFORE (2 lines):
+      with the information they need to make informed decisions about using the
+      dataset for their chosen tasks. Some of the questions are designed to
+
+REMOVED:
+      elicit information about compliance with the EU's General Data Protection
+      Regulation (GDPR) or comparable regulations in other jurisdictions.
+
+REPLACED WITH:
+      elicit information about compliance with applicable data protection
+      regulations and privacy requirements.
+
+AFTER (2 lines):
+  Collection:
+    description: >-
+--------------------------------------------------------------------------------
+
+REMOVAL 4: CSVW Dialect Mapping
+--------------------------------------------------------------------------------
+BEFORE (2 lines):
+
+  dialect:
+
+REMOVED:
+    slot_uri: csvw:dialect
+
+AFTER (2 lines):
+
+  bytes:
+--------------------------------------------------------------------------------
+
+================================================================================
+FILE: src/data_sheets_schema/schema/D4D_Data_Governance.yaml
+================================================================================
+
+REMOVAL 5: GDPR Reference in ExportControlRegulatoryRestrictions Description
+--------------------------------------------------------------------------------
+BEFORE (2 lines):
+      Do any export controls or other regulatory restrictions apply to the dataset
+      or to individual instances? Includes compliance tracking for regulations like
+
+REMOVED:
+      GDPR, HIPAA, and EU AI Act. If so, please describe these restrictions and
+
+REPLACED WITH:
+      HIPAA and other US regulations. If so, please describe these restrictions and
+
+AFTER (2 lines):
+      provide a link or copy of any supporting documentation. Maps to DUO terms
+      related to ethics approval, geographic restrictions, and institutional requirements.
+--------------------------------------------------------------------------------
+
+REMOVAL 6: gdpr_compliant Field
+--------------------------------------------------------------------------------
+BEFORE (2 lines):
+          - DUO:0000022  # GS - geographic restriction
+          - DUO:0000028  # IS - institution specific
+
+REMOVED:
+      gdpr_compliant:
+        description: >-
+          Indicates compliance with the EU General Data Protection Regulation (GDPR).
+          GDPR applies to processing of personal data of individuals in the EU.
+        range: ComplianceStatusEnum
+
+AFTER (2 lines):
+      hipaa_compliant:
+        description: >-
+--------------------------------------------------------------------------------
+
+REMOVAL 7: eu_ai_act_risk_category Field
+--------------------------------------------------------------------------------
+BEFORE (2 lines):
+          HIPAA applies to protected health information in the United States.
+        range: ComplianceStatusEnum
+
+REMOVED:
+      eu_ai_act_risk_category:
+        description: >-
+          Risk category under the EU AI Act. The EU AI Act classifies AI systems
+          into risk categories: minimal, limited, high, and unacceptable.
+          High-risk AI systems face strict requirements.
+        range: AIActRiskEnum
+
+AFTER (2 lines):
+      other_compliance:
+        description: >-
+--------------------------------------------------------------------------------
+
+REMOVAL 8: GDPR/EU AI Act in ComplianceStatusEnum Description
+--------------------------------------------------------------------------------
+BEFORE (2 lines):
+    description: >-
+      Compliance status for regulatory frameworks. Indicates the extent to which
+
+REMOVED:
+      a dataset complies with specific regulations (e.g., GDPR, HIPAA, EU AI Act).
+
+REPLACED WITH:
+      a dataset complies with specific regulations (e.g., HIPAA, 45 CFR 46).
+
+AFTER (2 lines):
+      These are workflow status values that may evolve as regulations are assessed
+      or as the dataset is modified.
+--------------------------------------------------------------------------------
+
+REMOVAL 9: Complete AIActRiskEnum (42 lines)
+--------------------------------------------------------------------------------
+BEFORE (2 lines):
+          determination has not been made.
+
+
+REMOVED:
+  AIActRiskEnum:
+    description: >-
+      Risk categories under the EU Artificial Intelligence Act (Regulation (EU) 2024/1689).
+      The AI Act establishes a risk-based regulatory framework with four categories.
+      See https://artificialintelligenceact.eu/ and https://eur-lex.europa.eu/eli/reg/2024/1689/oj
+    permissible_values:
+      minimal_risk:
+        description: >-
+          AI systems with minimal risk (e.g., AI-enabled video games, spam filters).
+          No specific obligations beyond general transparency for certain AI systems
+          (Article 50). Represents the majority of AI systems on the EU market.
+        related_mappings:
+          - EUAIAct:Article50
+      limited_risk:
+        description: >-
+          AI systems with limited risk subject to transparency obligations
+          (e.g., chatbots, emotion recognition systems, biometric categorization,
+          deepfakes). Must comply with specific transparency requirements to enable
+          users to make informed decisions (Article 50).
+        related_mappings:
+          - EUAIAct:Article50
+          - EUAIAct:TitleIV
+      high_risk:
+        description: >-
+          AI systems with high risk to health, safety, or fundamental rights as
+          defined in Annex III (e.g., AI in critical infrastructure, education,
+          employment, law enforcement, migration, justice). Subject to strict
+          requirements including conformity assessment, risk management, data
+          governance, transparency, human oversight, and accuracy (Articles 6-51).
+        related_mappings:
+          - EUAIAct:Article6
+          - EUAIAct:AnnexIII
+          - EUAIAct:TitleIII
+      unacceptable_risk:
+        description: >-
+          AI systems with unacceptable risk that are prohibited under Article 5
+          (e.g., social scoring by public authorities, exploitation of vulnerabilities,
+          real-time remote biometric identification in public spaces for law enforcement
+          with limited exceptions). These AI practices are banned in the EU.
+        related_mappings:
+          - EUAIAct:Article5
+
+AFTER (2 lines):
+  ConfidentialityLevelEnum:
+    description: >-
+--------------------------------------------------------------------------------
+
+================================================================================
+FILE: src/data_sheets_schema/schema/D4D_Human.yaml
+================================================================================
+
+REMOVAL 10: GDPR in Regulatory Compliance Examples
+--------------------------------------------------------------------------------
+BEFORE (2 lines):
+        description: >
+          What regulatory frameworks govern this human subjects research
+
+REMOVED:
+          (e.g., 45 CFR 46, GDPR, HIPAA)?
+
+REPLACED WITH:
+          (e.g., 45 CFR 46, HIPAA)?
+
+AFTER (2 lines):
+        range: string
+        multivalued: true
+--------------------------------------------------------------------------------
+
+================================================================================
+FILE: src/data_sheets_schema/schema/data_sheets_schema.yaml
+================================================================================
+
+REMOVAL 11: CSVW Prefix (Main Schema)
+--------------------------------------------------------------------------------
+BEFORE (2 lines):
+  AIO: https://w3id.org/aio/
+  biolink: https://w3id.org/biolink/vocab/
+
+REMOVED:
+  csvw: http://www.w3.org/ns/csvw#
+
+AFTER (2 lines):
+  data_sheets_schema: https://w3id.org/bridge2ai/data-sheets-schema/
+  datasets: https://w3id.org/linkml/report
+--------------------------------------------------------------------------------
+
+REMOVAL 12: Frictionless Prefix (Main Schema)
+--------------------------------------------------------------------------------
+BEFORE (2 lines):
+  dcterms: http://purl.org/dc/terms/
+  example: https://example.org/
+
+REMOVED:
+  frictionless: https://specs.frictionlessdata.io/
+
+AFTER (2 lines):
+  linkml: https://w3id.org/linkml/
+  mediatypes: https://www.iana.org/assignments/media-types/
+--------------------------------------------------------------------------------
+
+================================================================================
+SUMMARY OF REMOVALS
+================================================================================
+
+EU Regulations Removed:
+- GDPR (General Data Protection Regulation) - 5 references
+- EU AI Act (Regulation (EU) 2024/1689) - 3 references
+- Complete AIActRiskEnum with 4 risk categories (42 lines)
+- gdpr_compliant field from ExportControlRegulatoryRestrictions
+- eu_ai_act_risk_category field from ExportControlRegulatoryRestrictions
+
+Standards/Prefixes Removed:
+- csvw: http://www.w3.org/ns/csvw# (2 occurrences)
+- frictionless: https://specs.frictionlessdata.io/ (2 occurrences)
+- csvw:dialect slot mapping (1 occurrence)
+
+Impact:
+- Schema now focuses exclusively on US regulations (HIPAA, 45 CFR 46)
+- Removed overly granular CSVW mappings
+- Removed uncertain Frictionless mappings
+- Reduced from 40+ to 25+ aligned standards
+
+Total Removals: 12 distinct changes across 4 schema files
+Total Lines Removed: ~57 lines of EU regulatory content
+
+================================================================================