Skip to content

[Feature Request] Add a CLI command to query file usages (show where “in-use” files are referenced) #30477

@hsiong

Description

@hsiong

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

When running file cleanup/maintenance commands, it’s hard to understand why a file is considered “in use”, and where it is referenced.

Today we have two commands in api/commands.py that define the scope of “in-use” files:

  • flask clear-orphaned-file-records (DB-level)
    • Base tables: upload_files (id, key) and tool_files (id, file_key)
    • A file is treated as referenced if its UUID appears in one of these places (either equality match or UUID found in text/JSON):
      • message_files.upload_file_id (and relation to messages.id)
      • documents.data_source_info
      • document_segments.content
      • messages.answer, messages.inputs (json), messages.message (json)
      • workflow_node_executions.inputs, workflow_node_executions.process_data, workflow_node_executions.outputs (json)
      • conversations.introduction, conversations.system_instruction
      • accounts.avatar
      • apps.icon
      • sites.icon
    • It deletes file records that are not referenced anywhere above.
  • flask remove-orphaned-files-on-storage (storage-level)
    • Base tables: upload_files, tool_files (keys only)
    • Scans storage directories: image_files, tools, upload_files
    • Deletes storage objects that do not exist in DB base tables (does not check business references).

Problem: There is no command to inspect usage, i.e. “this file is referenced by which table/field and which record id”.
This makes troubleshooting difficult when files are unexpectedly removed or unexpectedly retained.

Related context: #11835

2. Additional context or comments

Proposed enhancement

Add a new Flask CLI command (similar style to existing commands) to query file usages and print a human-readable list.

Example command name (suggestion):

  • flask query-file-usages
    • or flask show-file-usages
    • or flask file-usage

Expected behavior

  • Reuse the same “in-use” definition as clear-orphaned-file-records (same reference columns, same matching strategy).
  • Output a list of references with consistent columns:
    • src (table.column, e.g. documents.data_source_info)
    • record_id (primary key of the referencing row)
    • file_id (matched file UUID)
    • key (resolved storage key from upload_files.key or tool_files.file_key)

Suggested options (nice to have)

  • --file-id <uuid>: show usage for a specific file UUID
  • --key <file_key>: show usage for a specific storage key
  • --src <pattern>: filter by a specific table/field (optional)
  • --limit N / --offset N (optional)
  • Output format:
    • default: table-like console output
    • optional: --json for scripting

Why this helps

  • Operators can quickly locate unexpected references (especially UUIDs embedded in text/JSON).
  • Safer cleanup workflows: verify a file’s real usage before deletion.
  • Easier debugging when clear-orphaned-file-records finds something “still in use” but the reason is unclear.

3. Can you help us with this feature?

  • I am interested in contributing to this feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions