Skip to content

fix(security): address all 24 security findings across codebase#303

Merged
imran-siddique merged 6 commits intomicrosoft:mainfrom
imran-siddique:fix/security-sweep-2026-03
Mar 20, 2026
Merged

fix(security): address all 24 security findings across codebase#303
imran-siddique merged 6 commits intomicrosoft:mainfrom
imran-siddique:fix/security-sweep-2026-03

Conversation

@imran-siddique
Copy link
Member

Security Sweep: 24 Findings Fixed

Comprehensive security audit and remediation across all packages, workflows, and dependencies.

🔴 Critical (9)

# CWE Fix Files
1 CWE-502 Replace pickle.loads with JSON + importlib registry process_isolation.py
2 CWE-502 Replace pickle.loads with JSON serialization agent_hibernation.py
3 CWE-78 Convert shell=True → list-form subprocess prepare_release.py, prepare_pypi.py
4 CWE-94 Replace eval() with safe AST walker calculator.py
5 CWE-77 Sanitize issue title in shell block ai-spec-drafter.yml
6 CWE-829 Pin setup-node action to SHA ai-agent-runner/action.yml
7 CWE-494 Add SHA-256 verification for NuGet download publish.yml
8-9 CWE-1395 Tighten cryptography>=44.0.0, django>=4.2 7 pyproject.toml files

🟠 High (6)

# CWE Fix Files
10 CWE-798 Replace hardcoded API key with fake placeholder extension.ts
11 CWE-502 yaml.safe_load + json.load replacements github-reviewer/main.py
12 CWE-94 Replace eval() docstring example langchain/tools.py
13 CWE-22 Add path traversal validation FileTrustStore.cs
14 CWE-295 Remove non-hash pip install fallback ci.yml, publish.yml
15 GHSA-rf6f-7fwh-wjgh Fix flatted prototype pollution 3 package-lock.json files

🟡 Medium (6)

  • CWE-79: Replace innerHTML with safe DOM APIs in Chrome extension
  • CWE-328: Replace MD5 → SHA-256 in examples
  • CWE-330: Replace random.randintsecrets module
  • CWE-327: Add deprecation warnings on HMAC-SHA256 fallback (.NET)
  • CWE-250: Narrow scorecard.yml permissions
  • Audit all 10 pull_request_target workflows (5 fixed, 5 confirmed safe)

🟢 Low (3)

  • Replace weak default passwords in example configs
  • Add security justification comments to safe workflows

Testing

  • ✅ 30/30 process isolation tests pass
  • ✅ 10/10 hibernation tests pass
  • ✅ 0 npm audit vulnerabilities across all 3 JS packages
  • ✅ Zero remaining pickle.loads, eval(, shell=True, or innerHTML patterns in modified files

Impact

39 files changed, 424 insertions, 128 deletions across:

  • 5 core Python packages
  • 15 GitHub Actions workflows
  • 3 npm packages
  • 1 .NET SDK
  • 7 dependency manifests

imran-siddique and others added 5 commits March 18, 2026 14:50
Address 3 critical gaps identified in Ona/Veto agent security research:

1. Tool content hashing (defeats tool aliasing/wrapping attacks):
   - ToolRegistry now computes SHA-256 hash of handler source at registration
   - execute_tool() verifies integrity before execution, blocks on mismatch
   - New ContentHashInterceptor in base.py for intercept-level hash verification
   - Integrity violation audit log with get_integrity_violations()

2. PolicyEngine freeze (prevents runtime self-modification):
   - New freeze() method makes engine immutable after initialization
   - add_constraint, set/update_agent_context, add_conditional_permission
     all raise RuntimeError when frozen
   - Full mutation audit log records all operations (allowed and blocked)
   - is_frozen property for inspection

3. Approval quorum and fatigue detection (defeats approval fatigue):
   - New QuorumConfig dataclass for M-of-N approval requirements
   - EscalationHandler supports quorum-based vote counting
   - Fatigue detection: auto-DENY when agent exceeds escalation rate threshold
   - Per-agent rate tracking with configurable window and threshold
   - EscalationRequest.votes field tracks individual approver votes

All changes are backward-compatible: new parameters are optional with
defaults that preserve existing behavior. 33 new tests, 53 total pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- PolicyEngine.freeze() now converts dicts to MappingProxyType/frozenset
  for true immutability (not just boolean guard) — addresses HIGH finding
- Removed insecure bytecode fallback from _compute_handler_hash; returns
  empty string with warning for unverifiable handlers — addresses CRITICAL
- Added CHANGELOG entries for all new security features
- Added 2 new tests: frozen dicts are immutable proxies, permissions are
  frozensets

55 tests pass (20 existing + 35 new).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document the 3 sandbox escape defenses with usage examples:
- Tool content hashing with ToolRegistry and ContentHashInterceptor
- PolicyEngine.freeze() with MappingProxyType immutability
- Approval quorum (QuorumConfig) and fatigue detection

Addresses docs-sync-checker feedback on PR microsoft#297.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implements the PolicyEvaluator protocol from google/adk-python#4897:
- ADKPolicyEvaluator: YAML-configurable policy engine for ADK agents
- GovernanceCallbacks: wires into before/after tool/agent hooks
- DelegationScope: monotonic scope narrowing for sub-agents
- Structured audit events with pluggable handlers
- Sample policy config (examples/policies/adk-governance.yaml)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Critical (9 fixed):
- CWE-502: Replace pickle.loads with JSON in process_isolation.py and agent_hibernation.py
- CWE-78: Convert shell=True to list-form subprocess in prepare_release.py, prepare_pypi.py
- CWE-94: Replace eval() with safe AST walker in calculator.py
- CWE-77: Sanitize issue title injection in ai-spec-drafter.yml
- CWE-829: Pin setup-node action to SHA in ai-agent-runner/action.yml
- CWE-494: Add SHA-256 verification for NuGet download in publish.yml
- CWE-1395: Tighten cryptography>=44.0.0, django>=4.2 across 7 pyproject.toml files

High (6 fixed):
- CWE-798: Replace hardcoded API key placeholder in VS Code extension
- CWE-502: yaml.safe_load + json.load in github-reviewer example
- CWE-94: Replace eval() docstring example in langchain tools
- CWE-22: Add path traversal validation in .NET FileTrustStore
- CWE-295: Remove non-hash pip install fallback in ci.yml and publish.yml
- GHSA-rf6f-7fwh-wjgh: Fix flatted prototype pollution in 3 npm packages

Medium (6 fixed):
- CWE-79: Replace innerHTML with safe DOM APIs in Chrome extension
- CWE-328: Replace MD5 with SHA-256 in github-reviewer
- CWE-330: Replace random.randint with secrets module in defi-sentinel
- CWE-327: Add deprecation warnings on HMAC-SHA256 fallback in .NET
- CWE-250: Narrow scorecard.yml permissions
- Audit all 10 pull_request_target workflows for HEAD checkout safety

Low (3 fixed):
- Replace weak default passwords in examples
- Add security justification comments to safe workflows

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file tests agent-mesh agent-mesh package ci/cd CI/CD and workflows size/XL Extra large PR (500+ lines) labels Mar 20, 2026
@github-actions
Copy link

🤖 AI Agent: breaking-change-detector

🔍 API Compatibility Report

Summary

The recent changes in the repository microsoft/agent-governance-toolkit primarily focus on security improvements and do not introduce any breaking changes to the public API. The modifications include enhancements to security practices, dependency updates, and the addition of new features. However, there are no public functions, classes, or methods that have been removed, renamed, or had their signatures altered in a way that would affect existing users.

Findings

Severity Package Change Impact
N/A No breaking changes found N/A

Migration Guide

Since no breaking changes were identified, there are no migration steps required for users of the API. Users can continue to use the existing functionality without any modifications.

@github-actions
Copy link

🤖 AI Agent: docs-sync-checker

📝 Documentation Sync Report

Issues Found

  • Sign(byte[] data) in AgentGovernance/Trust/AgentIdentity.cs — missing detailed docstring for parameters, return values, and exceptions.
  • Verify(byte[] data, byte[] signature) in AgentGovernance/Trust/AgentIdentity.cs — missing detailed docstring for parameters, return values, and exceptions.
  • VerifySignature(byte[] publicKey, byte[] data, byte[] signature, byte[]? privateKey = null) in AgentGovernance/Trust/AgentIdentity.cs — missing detailed docstring for parameters, return values, and exceptions.
  • ⚠️ packages/agent-governance-dotnet/README.md — no updates to reflect the deprecation of HMAC-SHA256 and migration recommendation to Ed25519.
  • ⚠️ CHANGELOG.md — missing explicit mention of the HMAC-SHA256 deprecation and Ed25519 migration guidance.
  • ⚠️ examples/policies/adk-governance.yaml — newly added example file, but no README or documentation references it.

Suggestions

  • 💡 Add detailed docstrings for:
    • Sign(byte[] data) — include parameter descriptions, return type, exceptions, and security implications.
    • Verify(byte[] data, byte[] signature) — include parameter descriptions, return type, exceptions, and security implications.
    • VerifySignature(byte[] publicKey, byte[] data, byte[] signature, byte[]? privateKey = null) — include parameter descriptions, return type, exceptions, and security implications.
  • 💡 Update packages/agent-governance-dotnet/README.md to:
    • Highlight the deprecation of HMAC-SHA256.
    • Recommend migrating to Ed25519 for signing and verification.
  • 💡 Add a CHANGELOG entry explicitly stating the deprecation of HMAC-SHA256 and the introduction of Ed25519 as the recommended cryptographic method.
  • 💡 Reference the new examples/policies/adk-governance.yaml file in the appropriate README or documentation section to guide users on its purpose and usage.

Additional Notes

  • The type hints in the Python code appear complete, and no new public APIs were introduced without type annotations.
  • Example code in examples/ has been updated with a new YAML configuration file, but its usage is not documented in the README or elsewhere.
  • Security-related comments and justifications in workflows are well-documented and align with best practices.

Final Assessment

Documentation and examples are not fully in sync with the code changes. Address the issues and suggestions above to ensure consistency and clarity.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request addresses 24 security findings across the repository, spanning multiple critical areas such as deserialization, command injection, cryptographic operations, and dependency hardening. The changes demonstrate a strong focus on improving security posture while maintaining backward compatibility where necessary. However, a few areas require further scrutiny or improvement.


🔴 CRITICAL Issues

  1. CWE-502: Unsafe Deserialization (pickle)

    • Files: process_isolation.py, agent_hibernation.py
    • Fix: Replacing pickle.loads with JSON serialization is a significant improvement. However, ensure that the JSON deserialization process includes schema validation to prevent potential attacks via maliciously crafted JSON.
      💡 Suggestion: Use a library like pydantic or jsonschema to validate the structure of deserialized JSON.
  2. CWE-94: Unsafe eval() Usage

    • Files: calculator.py, langchain/tools.py
    • Fix: Replacing eval() with a safe AST walker is a good approach. However, ensure the AST walker explicitly restricts dangerous operations (e.g., exec, eval, or import).
      💡 Suggestion: Add unit tests to verify that the AST walker rejects malicious payloads.
  3. CWE-77: Command Injection in GitHub Actions

    • Files: ai-spec-drafter.yml
    • Fix: Sanitizing ISSUE_TITLE and using printf for untrusted input is a good step. However, the sed command in SAFE_TITLE still processes untrusted input.
      🔴 Action: Use a stricter sanitization approach, such as whitelisting allowed characters ([a-z0-9-]), and explicitly reject invalid inputs.
  4. CWE-22: Path Traversal in FileTrustStore

    • Files: FileTrustStore.cs
    • Fix: Adding path traversal validation is a good step. However, the check for .. in the path is insufficient.
      🔴 Action: Use Path.GetFullPath to normalize the path and compare it to the expected base directory. Reject paths that escape the base directory.
  5. CWE-327: Weak Cryptographic Algorithm (HMAC-SHA256)

    • Files: AgentIdentity.cs
    • Fix: Deprecating HMAC-SHA256 in favor of Ed25519 is the right approach. However, the fallback to HMAC-SHA256 remains a security risk.
      🔴 Action: Provide a migration plan and timeline for deprecating HMAC-SHA256 entirely. Consider logging a warning whenever the fallback is used.

🟡 WARNING: Potential Breaking Changes

  1. Dependency Updates

    • Files: pyproject.toml (cryptography, django)
    • Impact: Tightening dependency versions (cryptography>=44.0.0,<47.0 and django>=4.2,<6.0) may break compatibility with projects relying on older versions.
      🟡 Action: Clearly document these changes in the release notes and consider providing guidance for users on upgrading their dependencies.
  2. HMAC-SHA256 Deprecation

    • Files: AgentIdentity.cs
    • Impact: Marking HMAC-SHA256 methods as [Obsolete] may break builds for users relying on these methods.
      🟡 Action: Provide detailed migration steps in the documentation and a timeline for removal.

💡 Suggestions for Improvement

  1. Thread Safety in Concurrent Agent Execution

    • Files: Multiple
    • Observation: The PR does not explicitly address thread safety in concurrent agent execution.
      💡 Suggestion: Audit shared resources (e.g., FileTrustStore) for thread safety and consider using synchronization primitives or thread-safe collections where necessary.
  2. OWASP Agentic Top 10 Compliance

    • Observation: The PR addresses several OWASP Agentic Top 10 issues (e.g., sandbox escape, approval fatigue). However, areas like "Agent Impersonation" and "Excessive Delegation" are not explicitly covered.
      💡 Suggestion: Add tests or policies to enforce identity verification and delegation limits.
  3. Type Safety and Pydantic Validation

    • Files: adk-governance.yaml
    • Observation: The new governance policy example does not include schema validation.
      💡 Suggestion: Use pydantic models to validate the policy structure and values before applying them.
  4. Backward Compatibility Testing

    • Observation: The PR introduces several changes that could impact backward compatibility.
      💡 Suggestion: Add backward compatibility tests to ensure existing functionality is not broken.
  5. Documentation Updates

    • Files: CHANGELOG.md, README.md
    • Observation: The changelog mentions new features and security improvements but lacks details on how to migrate from deprecated functionality.
      💡 Suggestion: Expand the changelog and README to include migration guides and examples for deprecated features.

✅ Positive Observations

  1. GitHub Actions Security

    • The use of pull_request_target with minimal permissions and explicit comments is excellent. This significantly reduces the risk of malicious PRs exploiting elevated permissions.
  2. Dependency Hardening

    • Pinning dependencies to specific versions with SHA-256 verification is a robust security measure.
  3. Testing Coverage

    • The PR includes comprehensive testing, with all tests passing and no remaining instances of critical vulnerabilities (pickle.loads, eval, etc.).
  4. Cryptographic Improvements

    • The introduction of Ed25519 for signing and verification is a significant improvement over HMAC-SHA256.

Final Assessment

  • Security: The PR addresses critical security issues effectively, but a few areas require additional hardening (e.g., path traversal, JSON validation).
  • Backward Compatibility: Some changes may break existing functionality. Clear migration guidance is needed.
  • Code Quality: The code changes are well-documented and follow good practices, but additional testing and validation are recommended.

Recommended Actions Before Merge

  1. 🔴 Address the critical issues highlighted above (e.g., stricter path traversal validation, JSON schema validation).
  2. 🟡 Document breaking changes and provide migration guidance.
  3. 💡 Add tests for thread safety, backward compatibility, and OWASP Agentic Top 10 compliance.

Once these issues are addressed, the PR will be ready for merge.

@github-actions
Copy link

🤖 AI Agent: test-generator

🧪 Test Coverage Analysis

packages/agent-mesh/src/agentmesh/integrations/langchain/tools.py

  • Existing coverage: The file appears to have some test coverage, as indicated by the presence of tests for langchain integrations in the tests/agent-mesh directory. However, the specific changes made to replace eval() with a safe AST walker need to be verified for coverage.
  • Missing coverage: The new AST walker implementation replacing eval() is not explicitly tested. There is no evidence of tests that validate the behavior of the AST walker or its ability to handle edge cases.
  • 💡 Suggested test cases:
    1. test_ast_walker_safe_eval_valid_expression — Test the AST walker with valid mathematical expressions to ensure correct evaluation.
    2. test_ast_walker_safe_eval_invalid_expression — Test the AST walker with invalid or malicious expressions (e.g., __import__('os').system('rm -rf /')) to ensure they are safely rejected.
    3. test_ast_walker_safe_eval_boundary_conditions — Test edge cases like empty strings, very large numbers, and deeply nested expressions.

packages/agent-os/src/agent_os/integrations/__init__.py

  • Existing coverage: The __init__.py file typically contains package-level imports or initializations. If it includes logic, it might be indirectly covered by tests for other modules in the agent_os package.
  • Missing coverage: If any new logic or imports were added to this file, they may not be explicitly tested.
  • 💡 Suggested test cases:
    1. test_agent_os_imports — Verify that all modules in agent_os are imported correctly and do not raise exceptions.
    2. test_agent_os_initialization — If the __init__.py file initializes any global state or configurations, ensure that these are tested for correctness.

packages/agent-os/src/agent_os/integrations/base.py

  • Existing coverage: The base.py file likely contains base classes or interfaces for integrations. If these are used by other modules, they may already have indirect test coverage.
  • Missing coverage: Any new methods or changes to existing base classes may not be explicitly tested.
  • 💡 Suggested test cases:
    1. test_base_integration_initialization — Test the initialization of any new or modified base classes to ensure they handle edge cases (e.g., missing or malformed parameters).
    2. test_base_integration_methods — Test any new or modified methods in the base classes, especially for edge cases like empty inputs or unexpected data types.

packages/agent-os/src/agent_os/integrations/escalation.py

  • Existing coverage: The escalation.py file likely handles escalation logic, which may already have some test coverage in the tests/agent-os directory.
  • Missing coverage: The new features, such as escalation fatigue detection and the votes field for per-approver vote tracking, are not explicitly tested.
  • 💡 Suggested test cases:
    1. test_escalation_fatigue_detection — Simulate a scenario where agents exceed the escalation rate threshold and verify that the system auto-denies further escalations.
    2. test_escalation_vote_tracking — Test the votes field to ensure that individual approver votes are correctly tracked and aggregated.
    3. test_escalation_quorum_config — Test the new QuorumConfig feature to ensure that M-of-N approval requirements are enforced correctly.
    4. test_escalation_invalid_quorum_config — Test invalid QuorumConfig values (e.g., M > N or M = 0) to ensure they are rejected or handled gracefully.
    5. test_escalation_partial_failures — Simulate partial failures in the escalation process (e.g., some approvers are unavailable) and verify the system's behavior.

General Recommendations

  1. Policy Evaluation: Add tests for boundary conditions and conflicting policies in the new adk-governance.yaml example. For instance:
    • test_policy_max_tool_calls_exceeded — Ensure that agents exceeding the max_tool_calls limit are blocked.
    • test_policy_conflicting_rules — Test scenarios where a tool is both blocked and requires approval.
  2. Trust Scoring: Add tests for the FileTrustStore changes in FileTrustStore.cs:
    • test_file_trust_store_path_traversal — Attempt to initialize the FileTrustStore with a path containing .. and verify that it raises an exception.
    • test_file_trust_store_valid_path — Test initialization with a valid path to ensure it succeeds.
  3. Concurrency: If any of the changes involve shared state or multithreading, add tests for race conditions and deadlocks.
  4. Input Validation: Add tests for malformed inputs, injection attempts, and oversized payloads for all modified files.

By addressing the missing coverage and suggested test cases, the repository can ensure robust security and functionality for the changes introduced in this pull request.

@github-actions
Copy link

🤖 AI Agent: security-scanner

Security Review of PR: fix(security): address all 24 security findings across codebase

This PR addresses 24 security findings across the codebase, including critical vulnerabilities. Below is a detailed analysis of the changes, categorized by the severity of the findings and their potential impact.


🔴 Critical Findings

1. CWE-502: Deserialization of Untrusted Data

  • Files: process_isolation.py, agent_hibernation.py
  • Fix: Replaced pickle.loads with JSON serialization/deserialization.
  • Analysis: pickle.loads is inherently unsafe as it can execute arbitrary code during deserialization, making it a prime target for remote code execution (RCE) attacks.
  • Rating: 🔴 CRITICAL
  • Recommendation: The fix is appropriate. Ensure that the JSON deserialization is implemented securely, avoiding potential issues like large payloads or unexpected data types.

2. CWE-78: OS Command Injection

  • Files: prepare_release.py, prepare_pypi.py
  • Fix: Converted shell=True subprocess calls to list-form arguments.
  • Analysis: Using shell=True can allow attackers to inject arbitrary shell commands if user input is not sanitized.
  • Rating: 🔴 CRITICAL
  • Recommendation: The fix is correct. Ensure that all user inputs passed to subprocess calls are validated and sanitized.

3. CWE-94: Improper Control of Code Generation (eval)

  • Files: calculator.py
  • Fix: Replaced eval() with a safe AST walker.
  • Analysis: eval() can execute arbitrary code, making it a severe security risk.
  • Rating: 🔴 CRITICAL
  • Recommendation: The fix is appropriate. Ensure the AST walker implementation is robust and does not allow unintended code execution.

4. CWE-77: Command Injection in GitHub Actions

  • Files: ai-spec-drafter.yml
  • Fix: Sanitized issue titles using printf and avoided shell interpretation of untrusted input.
  • Analysis: Unsanitized user input in shell commands can lead to command injection.
  • Rating: 🔴 CRITICAL
  • Recommendation: The fix is correct. Ensure all user inputs in shell commands are sanitized.

5. CWE-829: Inclusion of Functionality from Untrusted Control Sphere

  • Files: ai-agent-runner/action.yml
  • Fix: Pinned setup-node action to a specific SHA.
  • Analysis: Not pinning GitHub Actions to a specific SHA allows attackers to inject malicious code into the CI/CD pipeline by modifying the action.
  • Rating: 🔴 CRITICAL
  • Recommendation: The fix is appropriate. Ensure all actions are pinned to specific SHAs.

6. CWE-494: Download of Code Without Integrity Check

  • Files: publish.yml
  • Fix: Added SHA-256 verification for NuGet downloads.
  • Analysis: Downloading code without verifying its integrity can lead to supply chain attacks.
  • Rating: 🔴 CRITICAL
  • Recommendation: The fix is correct. Ensure all external downloads are verified with checksums.

7. CWE-1395: Use of Outdated Dependencies

  • Files: pyproject.toml (7 files)
  • Fix: Updated cryptography to >=44.0.0 and django to >=4.2.
  • Analysis: Outdated dependencies can have known vulnerabilities that attackers can exploit.
  • Rating: 🔴 CRITICAL
  • Recommendation: The fix is appropriate. Regularly audit and update dependencies to address newly discovered vulnerabilities.

🟠 High Findings

8. CWE-798: Hardcoded Credentials

  • Files: extension.ts
  • Fix: Replaced hardcoded API key with a fake placeholder.
  • Analysis: Hardcoded credentials can be extracted and misused by attackers.
  • Rating: 🟠 HIGH
  • Recommendation: The fix is correct. Use environment variables or secure secrets management solutions for sensitive data.

9. CWE-502: Unsafe YAML Deserialization

  • Files: github-reviewer/main.py
  • Fix: Replaced yaml.load with yaml.safe_load and json.load.
  • Analysis: yaml.load can execute arbitrary code during deserialization.
  • Rating: 🟠 HIGH
  • Recommendation: The fix is appropriate. Ensure that safe_load is used consistently across the codebase.

10. CWE-94: Unsafe Code Execution in Examples

  • Files: langchain/tools.py
  • Fix: Replaced eval() in docstring examples.
  • Analysis: Even in examples, eval() can encourage unsafe coding practices.
  • Rating: 🟠 HIGH
  • Recommendation: The fix is correct. Ensure that all examples follow secure coding practices.

11. CWE-22: Path Traversal

  • Files: FileTrustStore.cs
  • Fix: Added path validation to prevent directory traversal.
  • Analysis: Path traversal attacks can allow unauthorized access to sensitive files.
  • Rating: 🟠 HIGH
  • Recommendation: The fix is correct. Ensure path validation is consistently applied.

12. CWE-295: Insecure Dependency Installation

  • Files: ci.yml, publish.yml
  • Fix: Removed fallback to unverified pip installs.
  • Analysis: Installing dependencies without hash verification can lead to supply chain attacks.
  • Rating: 🟠 HIGH
  • Recommendation: The fix is appropriate. Ensure all dependency installations are verified.

13. GHSA-rf6f-7fwh-wjgh: Prototype Pollution

  • Files: package-lock.json (3 files)
  • Fix: Updated flatted dependency to a secure version.
  • Analysis: Prototype pollution can lead to arbitrary code execution or data manipulation.
  • Rating: 🟠 HIGH
  • Recommendation: The fix is correct. Regularly audit and update dependencies.

🟡 Medium Findings

  1. CWE-79: Cross-Site Scripting (XSS): Replaced innerHTML with safe DOM APIs in the Chrome extension. 🟡 MEDIUM
  2. CWE-328: Weak Hashing Algorithm: Replaced MD5 with SHA-256 in examples. 🟡 MEDIUM
  3. CWE-330: Predictable Random Values: Replaced random.randint with secrets module. 🟡 MEDIUM
  4. CWE-327: Deprecated Cryptographic Algorithms: Added deprecation warnings for HMAC-SHA256 fallback in .NET. 🟡 MEDIUM
  5. CWE-250: Overly Broad Permissions: Narrowed permissions in scorecard.yml. 🟡 MEDIUM
  6. Audit of pull_request_target Workflows: Fixed 5 workflows and confirmed 5 others as safe. 🟡 MEDIUM

🟢 Low Findings

  1. Replaced weak default passwords in example configs.
  2. Added security justification comments to workflows.
  3. Improved documentation for security practices.

General Observations

  1. Prompt Injection Defense: No bypasses detected in this PR. The changes include improvements to sandboxing and tool aliasing defenses.
  2. Policy Engine Circumvention: No circumvention vulnerabilities detected. The new PolicyEngine.freeze() method enhances immutability.
  3. Trust Chain Weaknesses: Deprecated HMAC-SHA256 fallback in .NET is flagged but mitigated with warnings and migration guidance.
  4. Credential Exposure: Hardcoded API keys were replaced with placeholders.
  5. Sandbox Escape: Hardened against sandbox escape vectors (e.g., tool aliasing, runtime policy self-modification).
  6. Deserialization Attacks: Replaced unsafe deserialization methods (pickle.loads, yaml.load) with secure alternatives.
  7. Race Conditions: No race conditions detected in the changes.
  8. Supply Chain: Dependency updates and hash verification address supply chain risks.

Final Assessment

This PR addresses critical security vulnerabilities effectively and improves the overall security posture of the repository. The fixes are well-implemented and align with best practices.

  • Overall Rating: 🔴 CRITICAL
  • Approval: ✅ Recommend approval after verifying the correctness of the changes through additional testing and code review.

The --require-hashes with inline --hash flags breaks when mixed
with editable installs. Restore the working pattern for test deps
while keeping hash verification for the lint requirements file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@imran-siddique imran-siddique merged commit ee60c8d into microsoft:main Mar 20, 2026
51 checks passed
@github-actions
Copy link

🤖 AI Agent: test-generator

🧪 Test Coverage Analysis

packages/agent-mesh/src/agentmesh/integrations/langchain/tools.py

  • Existing coverage:
    • Basic functionality of tools is likely covered by existing tests in tests/agentmesh/integrations/langchain/test_tools.py.
    • Tests for common inputs and outputs for LangChain tools.
  • Missing coverage:
    • Security-related changes, such as replacing eval() with a safe AST walker, are not explicitly tested.
    • Edge cases for malformed or malicious inputs (e.g., injection attempts).
  • 💡 Suggested test cases:
    1. test_safe_ast_walker_injection — Ensure that malicious inputs (e.g., __import__('os').system('rm -rf /')) are safely rejected by the AST walker.
    2. test_safe_ast_walker_boundary_conditions — Test edge cases like deeply nested expressions or very large inputs.
    3. test_tool_execution_with_malformed_inputs — Validate behavior when tools receive malformed or oversized payloads.

packages/agent-os/src/agent_os/integrations/__init__.py

  • Existing coverage:
    • Likely covered indirectly by integration tests for the agent_os package.
  • Missing coverage:
    • No explicit tests for initialization logic or edge cases in the __init__.py file.
  • 💡 Suggested test cases:
    1. test_integration_initialization_with_missing_dependencies — Simulate missing or incompatible dependencies and ensure graceful failure.
    2. test_integration_initialization_with_invalid_config — Test behavior when the integration is initialized with invalid or incomplete configuration.

packages/agent-os/src/agent_os/integrations/base.py

  • Existing coverage:
    • Core functionality of the base integration class is likely covered by tests in tests/agent_os/integrations/test_base.py.
  • Missing coverage:
    • Edge cases for concurrency (e.g., race conditions in shared state).
    • Input validation for methods accepting external data.
  • 💡 Suggested test cases:
    1. test_concurrent_access_to_shared_state — Simulate multiple threads or processes accessing shared state to detect race conditions.
    2. test_input_validation_for_base_methods — Test behavior when methods receive malformed or oversized inputs.
    3. test_base_integration_timeout_handling — Validate behavior when operations exceed expected time limits.

packages/agent-os/src/agent_os/integrations/escalation.py

  • Existing coverage:
    • Basic escalation logic is likely covered by tests in tests/agent_os/integrations/test_escalation.py.
    • Tests for standard escalation scenarios and approval workflows.
  • Missing coverage:
    • New features like escalation fatigue detection and vote tracking are not explicitly tested.
    • Edge cases for quorum configurations and rate limits.
  • 💡 Suggested test cases:
    1. test_escalation_fatigue_detection — Ensure that escalation requests are auto-denied when the rate exceeds the configured threshold.
    2. test_quorum_approval_edge_cases — Test scenarios where quorum configurations are at their boundaries (e.g., M=1, N=1 or M=N).
    3. test_vote_tracking_for_escalation_requests — Validate that individual votes are tracked correctly and that the final decision reflects the vote tally.
    4. test_escalation_with_invalid_quorum_config — Test behavior when quorum configurations are invalid (e.g., M > N or M=0).

Summary of Suggested Tests

  • Security: Test safe AST walker and injection prevention in tools.py.
  • Concurrency: Test race conditions and shared state access in base.py.
  • Input Validation: Test malformed and oversized inputs across all modules.
  • Escalation Logic: Test fatigue detection, vote tracking, and quorum edge cases in escalation.py.

Adding these tests will ensure robust coverage for the recent security fixes and feature additions.

@github-actions
Copy link

🤖 AI Agent: docs-sync-checker

📝 Documentation Sync Report

Issues Found

  • Sign(byte[] data) and Sign(string message) in AgentGovernance/Trust/AgentIdentity.cs — missing detailed docstrings for parameters and return values.
  • Verify(byte[] data, byte[] signature) and VerifySignature(byte[] publicKey, byte[] data, byte[] signature, byte[]? privateKey = null) in AgentGovernance/Trust/AgentIdentity.cs — missing detailed docstrings for parameters and return values.
  • ⚠️ packages/agent-governance-dotnet/README.md — no mention of the new deprecation warnings for HMAC-SHA256 fallback or the recommendation to migrate to Ed25519.
  • ⚠️ CHANGELOG.md — while the security fixes are mentioned, the deprecation of HMAC-SHA256 and the new path validation in FileTrustStore are not explicitly listed.
  • ⚠️ examples/policies/adk-governance.yaml — new example file added but not referenced in any README or documentation.

Suggestions

  • 💡 Add detailed docstrings for the following methods in AgentGovernance/Trust/AgentIdentity.cs:
    • Sign(byte[] data) — Explain the data parameter and the format of the returned byte array.
    • Sign(string message) — Explain the message parameter and the format of the returned byte array.
    • Verify(byte[] data, byte[] signature) — Explain the data and signature parameters and the boolean return value.
    • VerifySignature(byte[] publicKey, byte[] data, byte[] signature, byte[]? privateKey = null) — Explain all parameters and the boolean return value.
  • 💡 Update packages/agent-governance-dotnet/README.md to:
    • Mention the deprecation of HMAC-SHA256 fallback.
    • Recommend migrating to Ed25519 for signing and verification.
  • 💡 Add explicit entries in CHANGELOG.md for:
    • Deprecation of HMAC-SHA256 fallback in AgentIdentity.
    • Path traversal validation in FileTrustStore.
  • 💡 Reference the new examples/policies/adk-governance.yaml file in the relevant README or documentation section to ensure users are aware of its existence and purpose.

Type Hints

  • ✅ All new or modified public APIs in the diff appear to have appropriate type annotations.

Example Code

  • ⚠️ The new examples/policies/adk-governance.yaml file is a good addition, but it is not linked or explained in the documentation. Ensure it is integrated into the README or other relevant documentation.

README and Documentation

  • ⚠️ The README for agent-governance-dotnet does not reflect the new security recommendations or the deprecation of HMAC-SHA256.

Final Assessment

The PR introduces significant security improvements and deprecations, but the documentation is not fully updated to reflect these changes. Addressing the missing docstrings, README updates, and CHANGELOG entries will ensure the documentation remains in sync.

Action Required

  • Add missing docstrings for the methods in AgentIdentity.cs.
  • Update the README for agent-governance-dotnet to reflect the deprecations and recommendations.
  • Add explicit entries in CHANGELOG.md for the deprecations and path validation.
  • Reference the new example file in the documentation.

Once these issues are resolved, the documentation will be fully in sync.

@github-actions
Copy link

🤖 AI Agent: breaking-change-detector

🔍 API Compatibility Report

Summary

The pull request primarily addresses security vulnerabilities across the codebase, including changes to workflows, dependencies, and several packages. While the focus is on security fixes, some modifications have potential implications for API compatibility. After analyzing the diff, no breaking changes were identified in the public API of the microsoft/agent-governance-toolkit repository.

Findings

Severity Package Change Impact
🔵 agent-governance-dotnet Deprecated HMAC-SHA256 methods (Sign, Verify) Adds warnings but maintains backward compatibility
🔵 agent-marketplace Updated cryptography dependency to >=44.0.0,<47.0 Dependency update, no API changes
🔵 agent-mesh Updated django dependency to >=4.2,<6.0 Dependency update, no API changes
🔵 agent-mesh Added path traversal validation in FileTrustStore Improves security, no API changes
🔵 agent-mesh New governance policy example (adk-governance.yaml) Documentation addition, no API changes
🔵 agent-mesh Hardened workflows (pull_request_target) CI/CD security improvement, no API changes

Migration Guide

No migration steps are required as there are no breaking changes. However, downstream users should note the following:

  1. Deprecated HMAC-SHA256 Methods: Users relying on AgentIdentity.Sign or AgentIdentity.Verify in the .NET package should plan to migrate to Ed25519 for cryptographic operations when upgrading to .NET 9+.
  2. Dependency Updates: Ensure compatibility with updated versions of cryptography and django dependencies.

Additional Notes

  • The changes to workflows and dependency management are security-focused and do not affect the public API.
  • The addition of new features (e.g., ContentHashInterceptor, PolicyEngine.freeze(), QuorumConfig) are additive and should be documented for downstream users.

Conclusion

No breaking changes detected. All changes are either additive or security-related and maintain backward compatibility.

@github-actions
Copy link

🤖 AI Agent: security-scanner

Security Analysis of Pull Request

This pull request addresses 24 security findings across the microsoft/agent-governance-toolkit repository. Below is a detailed analysis of the changes, categorized by severity, with an assessment of the fixes and any additional recommendations.


🔴 Critical Findings

1. CWE-502: Unsafe Deserialization in process_isolation.py

  • Issue: Use of pickle.loads, which is vulnerable to arbitrary code execution.
  • Fix: Replaced pickle.loads with JSON deserialization and an importlib registry for safe object reconstruction.
  • Assessment: ✅ The fix is appropriate. JSON is safer than pickle for deserialization, and the use of an importlib registry ensures only approved objects are deserialized.
  • Severity: 🔴 CRITICAL

2. CWE-502: Unsafe Deserialization in agent_hibernation.py

  • Issue: Same as above.
  • Fix: Same as above.
  • Assessment: ✅ The fix is appropriate for the same reasons as above.
  • Severity: 🔴 CRITICAL

3. CWE-78: Command Injection in prepare_release.py and prepare_pypi.py

  • Issue: Use of shell=True in subprocess calls, which allows command injection.
  • Fix: Replaced shell=True with list-form arguments for subprocess.
  • Assessment: ✅ The fix is correct. Using list-form arguments mitigates command injection risks.
  • Severity: 🔴 CRITICAL

4. CWE-94: Code Injection in calculator.py

  • Issue: Use of eval() for evaluating expressions.
  • Fix: Replaced eval() with a safe AST walker.
  • Assessment: ✅ The fix is appropriate. The AST walker ensures only safe expressions are evaluated.
  • Severity: 🔴 CRITICAL

5. CWE-77: Shell Injection in ai-spec-drafter.yml

  • Issue: Untrusted user input (ISSUE_TITLE) used in shell commands without sanitization.
  • Fix: Sanitized ISSUE_TITLE using printf and safe string manipulation functions.
  • Assessment: ✅ The fix is correct. Using printf and avoiding direct shell interpolation mitigates the risk of injection.
  • Severity: 🔴 CRITICAL

6. CWE-829: Dependency Confusion in ai-agent-runner/action.yml

  • Issue: Unpinned setup-node action could lead to dependency confusion attacks.
  • Fix: Pinned setup-node action to a specific SHA.
  • Assessment: ✅ The fix is appropriate. Pinning to a specific SHA ensures the action cannot be tampered with.
  • Severity: 🔴 CRITICAL

7. CWE-494: Download of Code Without Integrity Check in publish.yml

  • Issue: NuGet CLI was downloaded without verifying its integrity.
  • Fix: Added SHA-256 checksum verification for the downloaded file.
  • Assessment: ✅ The fix is correct. Verifying the checksum ensures the file has not been tampered with.
  • Severity: 🔴 CRITICAL

8-9. CWE-1395: Weak Cryptography in pyproject.toml

  • Issue: Outdated versions of cryptography and django dependencies.
  • Fix: Updated cryptography to >=44.0.0 and django to >=4.2.
  • Assessment: ✅ The fix is appropriate. Updating to secure versions mitigates known vulnerabilities.
  • Severity: 🔴 CRITICAL

🟠 High Findings

10. CWE-798: Hardcoded API Key in extension.ts

  • Issue: Hardcoded API key could lead to credential exposure.
  • Fix: Replaced the hardcoded key with a placeholder.
  • Assessment: ✅ The fix is correct. However, ensure the placeholder is replaced with a secure method for injecting secrets in production.
  • Severity: 🟠 HIGH

11. CWE-502: Unsafe Deserialization in github-reviewer/main.py

  • Issue: Use of unsafe YAML and JSON deserialization.
  • Fix: Replaced yaml.load with yaml.safe_load and ensured json.load is used safely.
  • Assessment: ✅ The fix is appropriate. yaml.safe_load and json.load are safer alternatives.
  • Severity: 🟠 HIGH

12. CWE-94: Code Injection in langchain/tools.py

  • Issue: Example in docstring used eval().
  • Fix: Replaced eval() with a safe alternative in the example.
  • Assessment: ✅ The fix is appropriate. Removing unsafe examples reduces the risk of misuse.
  • Severity: 🟠 HIGH

13. CWE-22: Path Traversal in FileTrustStore.cs

  • Issue: File paths were not validated, allowing potential directory traversal.
  • Fix: Added path validation to reject paths containing .. segments.
  • Assessment: ✅ The fix is correct. Validating and resolving paths prevents directory traversal attacks.
  • Severity: 🟠 HIGH

14. CWE-295: Insecure Dependency Installation in ci.yml and publish.yml

  • Issue: Fallback to non-hash-verified pip install was allowed.
  • Fix: Removed fallback to non-hash-verified installations.
  • Assessment: ✅ The fix is appropriate. Enforcing hash verification ensures integrity of installed dependencies.
  • Severity: 🟠 HIGH

15. GHSA-rf6f-7fwh-wjgh: Prototype Pollution in flatted

  • Issue: Vulnerable version of flatted package.
  • Fix: Updated flatted to a secure version.
  • Assessment: ✅ The fix is correct. Updating to a secure version mitigates the vulnerability.
  • Severity: 🟠 HIGH

🟡 Medium Findings

  1. CWE-79: DOM-based XSS in Chrome Extension

    • Fix: Replaced innerHTML with safe DOM APIs.
    • Assessment: ✅ Correct fix.
    • Severity: 🟡 MEDIUM
  2. CWE-328: Weak Hashing Algorithm

    • Fix: Replaced MD5 with SHA-256 in examples.
    • Assessment: ✅ Correct fix.
    • Severity: 🟡 MEDIUM
  3. CWE-330: Predictable Randomness

    • Fix: Replaced random.randint with secrets module.
    • Assessment: ✅ Correct fix.
    • Severity: 🟡 MEDIUM
  4. CWE-327: Deprecated HMAC-SHA256 Fallback

    • Fix: Added deprecation warnings and migration guidance.
    • Assessment: ✅ Correct fix.
    • Severity: 🟡 MEDIUM
  5. CWE-250: Excessive Workflow Permissions

    • Fix: Narrowed permissions in scorecard.yml.
    • Assessment: ✅ Correct fix.
    • Severity: 🟡 MEDIUM
  6. Audit of pull_request_target Workflows

    • Fix: Reviewed and hardened workflows.
    • Assessment: ✅ Correct fix.
    • Severity: 🟡 MEDIUM

🟢 Low Findings

  1. Weak Default Passwords

    • Fix: Replaced weak defaults in example configs.
    • Assessment: ✅ Correct fix.
    • Severity: 🟢 LOW
  2. Security Justification Comments

    • Fix: Added comments to workflows.
    • Assessment: ✅ Correct fix.
    • Severity: 🟢 LOW

Recommendations

  • Testing: Ensure comprehensive testing of all changes, especially for critical fixes like deserialization and command injection.
  • Documentation: Update documentation to reflect new security practices and deprecations.
  • Monitoring: Implement runtime monitoring to detect any potential bypasses or regressions.

Overall Assessment

This pull request addresses critical security issues effectively. All fixes are appropriate and align with best practices. The changes significantly improve the security posture of the repository.

Final Rating: ✅ APPROVED

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request addresses 24 security findings across the repository, including critical vulnerabilities such as insecure deserialization, command injection, and unsafe cryptographic practices. The fixes span multiple areas, including Python, .NET, JavaScript, and GitHub Actions workflows. The changes are well-documented, and the testing results indicate that the fixes have been validated.

Below is a detailed review of the changes, categorized into Critical Issues, Warnings, and Suggestions.


🔴 CRITICAL ISSUES

  1. CWE-502: Insecure Deserialization (pickle.loads)

    • Files: process_isolation.py, agent_hibernation.py
    • Fix: Replaced pickle.loads with JSON serialization.
    • Review: ✅ The fix is appropriate. pickle is inherently insecure for untrusted input, and replacing it with JSON serialization mitigates the risk of arbitrary code execution. Ensure that the JSON deserialization process includes validation of the input schema to prevent potential issues.
  2. CWE-78: Command Injection via shell=True

    • Files: prepare_release.py, prepare_pypi.py
    • Fix: Replaced shell=True with list-form arguments for subprocess.
    • Review: ✅ This is a robust fix. Using list-form arguments prevents shell injection vulnerabilities. Ensure that all inputs to subprocess are validated to avoid unintended behavior.
  3. CWE-94: Use of eval()

    • Files: calculator.py, langchain/tools.py
    • Fix: Replaced eval() with a safe AST walker.
    • Review: ✅ The use of a safe AST walker is a significant improvement. Ensure that the implementation of the AST walker is thoroughly tested to prevent any bypasses.
  4. CWE-77: Improper Neutralization of Special Elements in Command Execution

    • Files: ai-spec-drafter.yml
    • Fix: Sanitized issue titles using printf and sed to prevent command injection.
    • Review: ✅ The use of printf and sed is a good approach to sanitize user input. Ensure that the sanitization logic is robust and accounts for edge cases.
  5. CWE-494: Download of Code Without Integrity Check

    • Files: publish.yml
    • Fix: Added SHA-256 verification for NuGet downloads.
    • Review: ✅ This is a critical improvement. Ensure that the SHA-256 hash is securely stored and updated when the NuGet version changes.
  6. CWE-22: Path Traversal

    • Files: FileTrustStore.cs
    • Fix: Added validation to prevent directory traversal by rejecting paths containing ...
    • Review: ✅ The fix is effective. Consider adding unit tests to verify that the validation works as expected for various edge cases.
  7. CWE-327: Use of a Broken or Risky Cryptographic Algorithm

    • Files: AgentIdentity.cs
    • Fix: Deprecated HMAC-SHA256 for signing and verification, added warnings, and recommended migration to Ed25519.
    • Review: ✅ The deprecation and migration recommendation are appropriate. Ensure that users are adequately informed about the deprecation timeline and provided with migration guides.

🟡 WARNINGS

  1. Dependency Updates

    • Files: pyproject.toml (cryptography, django)
    • Impact: Tightening the version constraints for cryptography and django could potentially break compatibility with existing environments.
    • Recommendation: Ensure that these changes are clearly communicated in the release notes and consider providing a migration guide if necessary.
  2. GitHub Actions: pull_request_target

    • Files: Multiple workflows (ai-breaking-change-detector.yml, ai-code-review.yml, etc.)
    • Fix: Updated workflows to use pull_request_target with base branch checkout for security.
    • Review: While this is a good security practice, it changes the behavior of workflows. Ensure that the workflows are tested to confirm they still function as expected. Add a note in the release notes about this change.

💡 SUGGESTIONS

  1. Audit for Remaining Vulnerabilities

    • While this PR addresses 24 findings, consider conducting a follow-up audit to ensure no additional vulnerabilities remain. For example, review any remaining uses of exec, os.system, or other potentially dangerous functions.
  2. Test Coverage for Security Fixes

    • Add unit tests to validate the security fixes, especially for:
      • JSON deserialization schema validation.
      • Path traversal prevention logic in FileTrustStore.cs.
      • AST walker implementation for eval() replacement.
  3. Backward Compatibility for HMAC-SHA256 Deprecation

    • Provide a clear migration guide for users to transition from HMAC-SHA256 to Ed25519. Consider adding a feature flag to allow users to opt-in to the new cryptographic scheme.
  4. Policy Engine Testing

    • The addition of adk-governance.yaml introduces a new policy configuration. Ensure that the policy engine is thoroughly tested to validate the enforcement of the new rules, especially for blocked tools and delegation controls.
  5. Documentation Updates

    • Update the documentation to reflect the changes made in this PR, including:
      • The new governance policy configuration.
      • The deprecation of HMAC-SHA256.
      • The tightened dependency constraints.

Final Recommendation

This PR addresses critical security vulnerabilities effectively and improves the overall security posture of the repository. However, the changes introduce potential backward compatibility issues and require thorough testing to ensure correctness.

  • Merge Readiness: ✅ Ready to merge after addressing the above suggestions and verifying backward compatibility.
  • Priority: High, due to the critical nature of the vulnerabilities being addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-mesh agent-mesh package ci/cd CI/CD and workflows dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation size/XL Extra large PR (500+ lines) tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant