Skip to content

Conversation

@jkppr
Copy link
Collaborator

@jkppr jkppr commented Aug 15, 2025

This pull request introduces an overhaul of the Digital Forensics Investigative Questions (DFIQ) integration and refactors the Yeti analyzers to use the official yeti-python library.

Description

The primary goals of this change are to modernize our DFIQ implementation, improve its flexibility by integrating YETI as a data source, and enhance the robustness of our threat intelligence analyzers.

Key Changes

1. DFIQv2 Implementation (dfiq.py, scenarios.py)

  • UUID as Primary Identifier: All DFIQ objects (Scenarios, Facets, Questions) now use a uuid as their primary identifier for lookups, relationships, and sorting. The human-readable dfiq_id (e.g., "S1001") is retained for backwards compatibility in graph building and as a secondary identifier.
  • YETI Integration: The system can now fetch DFIQ templates directly from a connected YETI instance. A configuration flag (YETI_DFIQ_ENABLED) controls this behavior.
  • Unified DFIQ Loading: A new function load_dfiq_from_config in scenarios.py handles loading from both the local filesystem (DFIQ_PATH) and YETI. It intelligently merges the two sources, with YETI data taking precedence in case of UUID conflicts.
  • Graph & Data Model Update: The internal DFIQ graph in dfiq.py is now built using UUIDs. The in-memory data model (ScenarioTemplate, FacetTemplate, etc.) has been updated to reflect parent-child relationships correctly.
  • API Enhancements:
    • POST /api/v1/sketches/<id>/scenarios/ now prioritizes uuid for template lookups, providing a more stable reference than dfiq_id or name.
    • The API now correctly creates the full hierarchy of Scenarios, Facets, and Questions in the database from a chosen template.
  • Legacy DFIQ Removal: The old scenarios.yaml, facets.yaml, and questions.yaml files have been removed, along with the old DFIQ YAML files, in favor of the new structured directory format. A README.md is added to guide users.

2. Yeti Analyzer Refactor (yetiindicators.py, yetiindicators_test.py)

  • Adoption of yeti-python Library: All Yeti analyzers now use the official YetiApi client instead of direct requests calls.
  • Simplified Authentication: Manual session authentication and token handling have been removed. The YetiApi library now manages authentication and token refreshes transparently.
  • Robust Initialization & Error Handling: The YetiBaseAnalyzer.__init__ method now handles authentication. If authentication fails, it raises a RuntimeError, causing the analyzer to fail fast with a clear error message instead of proceeding in an invalid state.
  • Code Cleanup: Replaced direct API endpoint calls with high-level library methods (e.g., api.search_graph, api.search_bloom), making the code cleaner and more maintainable.
  • Updated Tests: Unit tests have been refactored to mock the YetiApi class and its methods, aligning with the new implementation.

@jkppr jkppr requested a review from tomchop August 15, 2025 17:32
@jkppr jkppr self-assigned this Aug 15, 2025
@jkppr jkppr added Backend Code Health Code health improvements labels Aug 15, 2025
@jkppr
Copy link
Collaborator Author

jkppr commented Aug 15, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial and well-executed pull request that significantly refactors the DFIQ and Yeti analyzer integrations. The move to UUIDs as primary identifiers for DFIQ objects is a great step towards a more robust system. The integration with Yeti for DFIQ templates is a powerful new feature, and the unified loading mechanism is well-designed. The refactoring of the Yeti analyzers to use the official yeti-python library greatly simplifies the code and improves maintainability. The API and database interaction changes, such as using a single transaction for creating scenario objects, are excellent improvements. My only major feedback concerns error handling during the initialization of the Yeti analyzers, where missing configuration could lead to unhandled exceptions.

@jkppr
Copy link
Collaborator Author

jkppr commented Aug 15, 2025

Note: This PR requires https://github.com/yeti-platform/yeti-python to push a new release first to work correctly.

Copy link
Collaborator

@tomchop tomchop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes! First pass on the analyzer, will take a look at the rest soon

@jkppr
Copy link
Collaborator Author

jkppr commented Aug 18, 2025

The failing linter will be fixed once yeti-python 2.0.9 is published.

@jkppr jkppr requested a review from tomchop August 18, 2025 14:16
@jkppr jkppr requested a review from tomchop August 19, 2025 15:15
@jkppr jkppr requested a review from berggren August 19, 2025 16:04
@jkppr jkppr merged commit fe312ff into google:master Aug 19, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend Code Health Code health improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants