-
Notifications
You must be signed in to change notification settings - Fork 270
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Self Checks
To make sure we get to you in time, please check the following :)
- I have searched for existing issues search for existing issues, including closed ones.
- I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- "Please do not modify this template :) and fill in all the required fields."
Versions
- dify-plugin-daemon Version 0.5.1
- dify-api Version 1.10.1-fix.1
Describe the bug
When running multiple pods of dify-plugin-daemon in k8s, the InitPythonEnvironment() function can be executed concurrently by different pods for the same plugin. This race condition may cause the deleteVirtualEnvironment() function to be triggered while another pod is still initializing the Python virtual environment, resulting in the virtual environment being deleted during the initialization process. This leads to initialization failures and unpredictable behavior.
Root Cause
The issue is triggered by the handleNewLocalPlugins() function in the cluster mode:
- Each pod runs
startLocalMonitor()which callshandleNewLocalPlugins()every 30 seconds handleNewLocalPlugins()lists all installed plugins from the sharedinstalledBucket(cross-pod shared storage)- Each pod checks if the plugin is already running locally (
c.localPluginRuntimes.Exists()) - If not running locally, the pod calls
LaunchLocalPlugin()->InitEnvironment()->InitPythonEnvironment() - Although
LaunchLocalPlugin()has alocalPluginInstallationLock, this lock is in-memory and only prevents concurrent launches within the same pod - Multiple pods can simultaneously pass this local lock check and all attempt to initialize the same plugin's Python environment in the shared storage
- This causes race conditions where one pod's
deleteVirtualEnvironment()destroys the environment being built by another pod
To Reproduce
Steps to reproduce the behavior:
- Deploy dify-plugin-daemon with multiple replicas (e.g., 3+ pods) in a Kubernetes cluster
- Share the same persistent volume or storage for plugin working directories across all pods
- Install or update a plugin that triggers Python environment initialization (it take about 1 minute to initialize)
- Observe that multiple pods attempt to initialize the Python environment simultaneously
- See error: virtual environment gets deleted while being initialized, causing "virtual environment is invalid" errors
Expected behavior
- Only one pod should be allowed to initialize the Python environment for a specific plugin at a time
- Other pods should wait for the initialization to complete or timeout gracefully
- The initialization process should be protected by a distributed lock mechanism
- Log messages should be emitted when the virtual environment is being deleted and recreated
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working