Security hardening: logging, cost protection, SSRF, model validation by neuromechanist · Pull Request #248 · OpenScience-Collective/osa

neuromechanist · 2026-03-03T23:24:08Z

Summary

Addresses four security issues:

Add API key log redaction #65 - API key log redaction: Wired up configure_secure_logging() in app startup (src/api/main.py) so SecureFormatter is active before any logging occurs. The formatter and tests already existed but were never called.
Add SSRF protection for documentation URLs #66 - SSRF protection: Verified existing validate_source_url() in DocSource with comprehensive test coverage (localhost, private IPs, AWS metadata, non-HTTP schemes). No gaps found.
Add cost manipulation protection #67 - Cost manipulation protection: Added _check_model_cost() guard in create_community_assistant() that blocks models above $15/1M input tokens on platform/community keys (HTTP 403), warns above $5/1M. BYOK users are unrestricted. Unknown models are allowed.
Add model name format validation #68 - Model name validation: Verified existing _validate_model_id() with pattern+length validation and comprehensive tests. No gaps found.

Closes #65, closes #66, closes #67, closes #68

Test plan

uv run pytest tests/test_api/test_cost_protection.py -v -- 6 passed, 1 skipped (no models between warn/block thresholds)
uv run pytest tests/test_core/test_config/test_community.py -v -- SSRF + model validation tests pass
uv run ruff check . && uv run ruff format --check . -- clean
Full test suite: 1605 passed, 18 skipped, 5 pre-existing failures (CLI standalone, unrelated)
Verify on dev: start server, confirm logs use SecureFormatter
Verify on dev: attempt expensive model on platform key, confirm 403

- Wire up SecureFormatter in app startup (#65): call configure_secure_logging() before any logging occurs - Add cost manipulation protection (#67): block models above $15/1M input tokens on platform/community keys, warn above $5/1M; BYOK users unrestricted - Verified SSRF protection (#66) and model validation (#68) already have comprehensive test coverage Closes #65, closes #66, closes #67, closes #68

- Fix misleading "fallback rate" comment in _check_model_cost - Add logging for unknown models (operator visibility) - Extract _models_by_cost() test helper to reduce duplication - Add boundary test at exact block threshold - Add BYOK + unknown model test - Assert BYOK guidance in error message - Fix module docstring wording

Split the catch-all Exception handler into specific expected errors (ValueError, TypeError, KeyError) that include context for debugging, and unexpected errors that re-raise after printing to stderr. Matches the pattern already used in SecureFormatter.format().

neuromechanist · 2026-03-03T23:32:34Z

PR Review Findings

Ran comprehensive review with 5 agents (code-reviewer, silent-failure-hunter, pr-test-analyzer, comment-analyzer, code-simplifier). All findings addressed in follow-up commits.

Critical (1 found, fixed)

Misleading comments: _check_model_cost had comments saying "unknown models get fallback rate" but the code actually returns early with no cost check. Fixed to accurately describe behavior.

Important (3 found, fixed)

No logging for unknown models: Unknown models bypassed cost protection silently with zero operator visibility. Added logger.info so operators can see when unpriced models are used.
Duplicated test logic: Same model-filtering list comprehension appeared in 5 of 7 tests. Extracted into _models_by_cost() helper.
SecureJSONFormatter broad catch (pre-existing tech debt): Catch-all Exception handler swallowed all errors with a lossy fallback, losing the original message, logger name, and error details. Split into specific expected errors (ValueError, TypeError, KeyError) with debugging context, and unexpected errors that re-raise after printing to stderr. Now matches the pattern already used in SecureFormatter.format().

Suggestions (5 found, 4 addressed)

Added boundary test at exact block threshold
Added BYOK + unknown model test
Added assertion that 403 error includes BYOK guidance URL
Fixed module docstring ("cost manipulation" -> "model cost protection")
Literal type for key_source -- skipped, consistent with existing codebase pattern using str

neuromechanist added 3 commits March 3, 2026 15:23

neuromechanist merged commit a294024 into develop Mar 4, 2026
7 checks passed

This was referenced Mar 8, 2026

Release v0.8.0: security hardening and database mirrors #252

Merged

Add API key log redaction #65

Closed

Add SSRF protection for documentation URLs #66

Closed

Add cost manipulation protection #67

Closed

Add model name format validation #68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security hardening: logging, cost protection, SSRF, model validation#248

Security hardening: logging, cost protection, SSRF, model validation#248
neuromechanist merged 3 commits intodevelopfrom
feature/security-hardening-65-66-67-68

neuromechanist commented Mar 3, 2026

Uh oh!

neuromechanist commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neuromechanist commented Mar 3, 2026

Summary

Test plan

Uh oh!

neuromechanist commented Mar 3, 2026

PR Review Findings

Critical (1 found, fixed)

Important (3 found, fixed)

Suggestions (5 found, 4 addressed)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant