Skip to content

install plugin in cluster env, InitPythonEnvironment will delete .venv dir causing race conditions and failures #541

@avtion

Description

@avtion

Self Checks

To make sure we get to you in time, please check the following :)

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • "Please do not modify this template :) and fill in all the required fields."

Versions

  1. dify-plugin-daemon Version 0.5.1
  2. dify-api Version 1.10.1-fix.1

Describe the bug
When running multiple pods of dify-plugin-daemon in k8s, the InitPythonEnvironment() function can be executed concurrently by different pods for the same plugin. This race condition may cause the deleteVirtualEnvironment() function to be triggered while another pod is still initializing the Python virtual environment, resulting in the virtual environment being deleted during the initialization process. This leads to initialization failures and unpredictable behavior.

Root Cause
The issue is triggered by the handleNewLocalPlugins() function in the cluster mode:

  1. Each pod runs startLocalMonitor() which calls handleNewLocalPlugins() every 30 seconds
  2. handleNewLocalPlugins() lists all installed plugins from the shared installedBucket (cross-pod shared storage)
  3. Each pod checks if the plugin is already running locally (c.localPluginRuntimes.Exists())
  4. If not running locally, the pod calls LaunchLocalPlugin() -> InitEnvironment() -> InitPythonEnvironment()
  5. Although LaunchLocalPlugin() has a localPluginInstallationLock, this lock is in-memory and only prevents concurrent launches within the same pod
  6. Multiple pods can simultaneously pass this local lock check and all attempt to initialize the same plugin's Python environment in the shared storage
  7. This causes race conditions where one pod's deleteVirtualEnvironment() destroys the environment being built by another pod

To Reproduce
Steps to reproduce the behavior:

  1. Deploy dify-plugin-daemon with multiple replicas (e.g., 3+ pods) in a Kubernetes cluster
  2. Share the same persistent volume or storage for plugin working directories across all pods
  3. Install or update a plugin that triggers Python environment initialization (it take about 1 minute to initialize)
  4. Observe that multiple pods attempt to initialize the Python environment simultaneously
  5. See error: virtual environment gets deleted while being initialized, causing "virtual environment is invalid" errors

Expected behavior

  • Only one pod should be allowed to initialize the Python environment for a specific plugin at a time
  • Other pods should wait for the initialization to complete or timeout gracefully
  • The initialization process should be protected by a distributed lock mechanism
  • Log messages should be emitted when the virtual environment is being deleted and recreated

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions