[BUG] Pod scenarios (not configured for rollback) stuck

# Bug Description

## **Describe the bug**
During the pod scenarios, I want to stop the run during the pod wait and properly exit with a warning but complete the run at that time, not wait for the expected recovery time to complete


## **To Reproduce**
python run_kraken.py 
<ctrl> + c to stop run during wait for pod recovery 
Error seen

### Scenario File
Scenario file(s) that were specified in your config file (can be starred (*) with confidential information )
```yaml
kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    exit_on_failure: False                                 # Exit when a post action scenario fails
    auto_rollback: True                                    # Enable auto rollback for scenarios.
    rollback_versions_directory: /tmp/kraken-rollback      # Directory to store rollback version files.
    publish_kraken_status: True                            # Can be accessed at http://0.0.0.0:8081
    signal_state: RUN                                      # Will wait for the RUN signal when set to PAUSE before running the scenarios, refer docs/signal.md for more details
    signal_address: 0.0.0.0                                # Signal listening address
    port: 8081                                             # Signal port
    chaos_scenarios:
       # List of policies/chaos scenarios to load
       - pod_disruption_scenarios:
           - scenarios/openshift/etcd.yml

cerberus:
    cerberus_enabled: False                                # Enable it when cerberus is previously installed
    cerberus_url:                                          # When cerberus_enabled is set to True, provide the url where cerberus publishes go/no-go signal
    check_application_routes: False                         # When enabled will look for application unavailability using the routes specified in the cerberus config and fails the run

performance_monitoring:
    prometheus_url: ''                                    # The prometheus url/route is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes.
    prometheus_bearer_token:                              # The bearer token is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes. This is needed to authenticate with prometheus.
    uuid:                                                 # uuid for the run is generated by default if not set
    enable_alerts: False                                  # Runs the queries specified in the alert profile and displays the info or exits 1 when severity=error
    enable_metrics: False
    alert_profile: config/alerts.yaml                          # Path or URL to alert profile with the prometheus queries
    metrics_profile: config/metrics-report.yaml
    check_critical_alerts: False                          # When enabled will check prometheus for critical alerts firing post chaos
elastic:
    enable_elastic: False
    verify_certs: False
    elastic_url: ""                                         # To track results in elasticsearch, give url to server here; will post telemetry details when url and index not blank
    elastic_port: 32766
    username: "elastic"
    password: "test"
    metrics_index: "krkn-metrics"
    alerts_index: "krkn-alerts"
    telemetry_index: "krkn-telemetry"

tunings:
    wait_duration: 1                                      # Duration to wait between each chaos scenario
    iterations: 1                                          # Number of times to execute the scenarios
    daemon_mode: False                                     # Iterations are set to infinity which means that the kraken will cause chaos forever
telemetry:
    enabled: False                                           # enable/disables the telemetry collection feature
    api_url: https://ulnmf9xv7j.execute-api.us-west-2.amazonaws.com/production #telemetry service endpoint
    username: username                                      # telemetry service username
    password: password                                    # telemetry service password
    prometheus_backup: True                                 # enables/disables prometheus data collection
    prometheus_namespace: ""                                # namespace where prometheus is deployed (if distribution is kubernetes)
    prometheus_container_name: ""                           # name of the prometheus container name (if distribution is kubernetes)
    prometheus_pod_name: ""                                 # name of the prometheus pod (if distribution is kubernetes)
    full_prometheus_backup: False                           # if is set to False only the /prometheus/wal folder will be downloaded.
    backup_threads: 5                                       # number of telemetry download/upload threads
    archive_path: /tmp                                      # local path where the archive files will be temporarly stored
    max_retries: 0                                          # maximum number of upload retries (if 0 will retry forever)
    run_tag: ''                                             # if set, this will be appended to the run folder in the bucket (useful to group the runs)
    archive_size: 500000
    telemetry_group: ''                                     # if set will archive the telemetry in the S3 bucket on a folder named after the value, otherwise will use "default"
    # the size of the prometheus data archive size in KB. The lower the size of archive is
                                                            # the higher the number of archive files will be produced and uploaded (and processed by backup_threads
                                                            # simultaneously).
                                                            # For unstable/slow connection is better to keep this value low
                                                            # increasing the number of backup_threads, in this way, on upload failure, the retry will happen only on the
                                                            # failed chunk without affecting the whole upload.
    logs_backup: True
    logs_filter_patterns:
     - "(\\w{3}\\s\\d{1,2}\\s\\d{2}:\\d{2}:\\d{2}\\.\\d+).+"         # Sep 9 11:20:36.123425532
     - "kinit (\\d+/\\d+/\\d+\\s\\d{2}:\\d{2}:\\d{2})\\s+"          # kinit 2023/09/15 11:20:36 log
     - "(\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d+Z).+"      # 2023-09-15T11:20:36.123425532Z log
    oc_cli_path: /usr/bin/oc                                # optional, if not specified will be search in $PATH
    events_backup: True                                     # enables/disables cluster events collection

health_checks:                                              # Utilizing health check endpoints to observe application behavior during chaos injection.
    interval:                                               # Interval in seconds to perform health checks, default value is 2 seconds
    config:                                                 # Provide list of health check configurations for applications
        - url:                                              # Provide application endpoint
          bearer_token:                                     # Bearer token for authentication if any
          auth:                                             # Provide authentication credentials (username , password) in tuple format if any, ex:("admin","secretpassword")
          exit_on_failure:                                  # If value is True exits when health check failed for application, values can be True/False

kubevirt_checks:                                            # Utilizing virt check endpoints to observe ssh ability to VMI's during chaos injection.
    interval: 2                                             # Interval in seconds to perform virt checks, default value is 2 seconds
    namespace:                                              # Namespace where to find VMI's
    name:                                                   # Regex Name style of VMI's to watch, optional, will watch all VMI names in the namespace if left blank
    only_failures: False                                    # Boolean of whether to show all VMI's failures and successful ssh connection (False), or only failure status' (True) 
    disconnected: False                                     # Boolean of how to try to connect to the VMIs; if True will use the ip_address to try ssh from within a node, if false will use the name and uses virtctl to try to connect; Default is False
    ssh_node: ""                                            # If set, will be a backup way to ssh to a node. Will want to set to a node that isn't targeted in chaos
    node_names: ""
    exit_on_failure:                                        # If value is True and VMI's are failing post chaos returns failure, values can be True/False

```

### Config File 
Config file you used when error was seen (the default used is config/config.yaml)

```yaml
- id: kill-pods
  config:
    namespace_pattern: ^openshift-etcd$
    label_selector: k8s-app=etcd
    krkn_pod_recovery_time: 120
    exclude_label: "" # excludes pods marked with this label from chaos


```

## **Expected behavior**
Keyboard interrupt the run print ^C `[WARNING] Signal SIGINT received without complete context, skipping rollback` once then exit the scenario and output telemetry and give failure end status

## **Krkn Output**

```
% python run_kraken.py 
 _              _              
| | ___ __ __ _| | _____ _ __  
| |/ / '__/ _` | |/ / _ \ '_ \ 
|   <| | | (_| |   <  __/ | | |
|_|\_\_|  \__,_|_|\_\___|_| |_|
                               

2026-01-29 14:42:46,357 [INFO] Starting kraken
2026-01-29 14:42:46,369 [INFO] Initializing client to talk to the Kubernetes cluster
2026-01-29 14:42:46,369 [INFO] Generated a uuid for the run: 29fae477-1d2c-4f89-8208-c5540edf32de
2026-01-29 14:42:46,574 [INFO] Detected distribution openshift
2026-01-29 14:42:48,865 [INFO] Publishing kraken status at http://0.0.0.0:8081
2026-01-29 14:42:48,884 [INFO] Starting http server at http://0.0.0.0:8081

2026-01-29 14:42:48,885 [INFO] Fetching cluster info
2026-01-29 14:42:49,225 [INFO] 4.20.4
2026-01-29 14:42:49,225 [INFO] Server URL: https://api.***openshift.com:6443
2026-01-29 14:42:49,226 [INFO] Daemon mode not enabled, will run through 1 iterations

2026-01-29 14:42:50,897 [INFO] 📣 `ScenarioPluginFactory`: types from config.yaml mapped to respective classes for execution:
2026-01-29 14:42:50,897 [INFO]   ✅ type: application_outages_scenarios ➡️ `ApplicationOutageScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: container_scenarios ➡️ `ContainerScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: hog_scenarios ➡️ `HogsScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: kubevirt_vm_outage ➡️ `KubevirtVmOutageScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: managedcluster_scenarios ➡️ `ManagedClusterScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ types: [pod_network_scenarios, ingress_node_scenarios] ➡️ `NativeScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: network_chaos_scenarios ➡️ `NetworkChaosScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: network_chaos_ng_scenarios ➡️ `NetworkChaosNgScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: node_scenarios ➡️ `NodeActionsScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: pod_disruption_scenarios ➡️ `PodDisruptionScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: pvc_scenarios ➡️ `PvcScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: service_disruption_scenarios ➡️ `ServiceDisruptionScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: service_hijacking_scenarios ➡️ `ServiceHijackingScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: cluster_shut_down_scenarios ➡️ `ShutDownScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: syn_flood_scenarios ➡️ `SynFloodScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: time_scenarios ➡️ `TimeActionsScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO]   ✅ type: zone_outages_scenarios ➡️ `ZoneOutageScenarioPlugin` 
2026-01-29 14:42:50,897 [INFO] 

2026-01-29 14:42:50,898 [INFO] health checks config is not defined, skipping them
2026-01-29 14:42:50,898 [INFO] kube virt checks config is not defined, skipping them
2026-01-29 14:42:50,898 [INFO] Executing scenarios for iteration 0
2026-01-29 14:42:50,898 [INFO] connection set up
127.0.0.1 - - [29/Jan/2026 14:42:50] "GET / HTTP/1.1" 200 -
2026-01-29 14:42:50,899 [INFO] response RUN
2026-01-29 14:42:50,901 [INFO] Signal handlers registered globally
2026-01-29 14:42:50,901 [INFO] Running PodDisruptionScenarioPlugin: ['pod_disruption_scenarios'] -> scenarios/openshift/etcd.yml
2026-01-29 14:42:51,136 [INFO] waiting up to 120 seconds for pod recovery, pod label pattern: k8s-app=etcd namespace pattern: ^openshift-etcd$
2026-01-29 14:42:51,493 [INFO] ('etcd-prubenda1111-ck5s7-master-1.c.chaos-438115.internal', 'openshift-etcd')
2026-01-29 14:42:51,493 [INFO] Deleting pod etcd-prubenda1111-ck5s7-master-1.c.chaos-438115.internal
^C2026-01-29 14:43:33,142 [INFO] Performing rollback for signal SIGINT with run_uuid=29fae477-1d2c-4f89-8208-c5540edf32de, scenario_type=pod_disruption_scenarios
2026-01-29 14:43:33,143 [WARNING] Skip execution for run_uuid=29fae477-1d2c-4f89-8208-c5540edf32de, scenario_type=pod_disruption_scenarios
2026-01-29 14:43:33,143 [INFO] Calling original handler for SIGINT
Traceback (most recent call last):
  File "/Users/prubenda/Github/kraken/run_kraken.py", line 717, in <module>
    retval = main(options, command)
  File "/Users/prubenda/Github/kraken/run_kraken.py", line 367, in main
    scenario_plugin.run_scenarios(
  File "/Users/prubenda/Github/kraken/krkn/scenario_plugins/abstract_scenario_plugin.py", line 104, in run_scenarios
    return_value = self.run(
  File "/Users/prubenda/Github/kraken/krkn/scenario_plugins/pod_disruption/pod_disruption_scenario_plugin.py", line 65, in run
    snapshot = future_snapshot.result()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 440, in result
    self._condition.wait(timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 312, in wait
    waiter.acquire()
  File "/Users/prubenda/Github/kraken/krkn/rollback/signal.py", line 64, in _signal_handler
    original_handler(signum, frame)
KeyboardInterrupt
^C2026-01-29 14:43:33,350 [WARNING] Signal SIGINT received without complete context, skipping rollback.
^C2026-01-29 14:43:33,549 [WARNING] Signal SIGINT received without complete context, skipping rollback.
```

## **Additional context**
The rollback functionality can't be added for this scenario as the pod *should* come back on its own 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Pod scenarios (not configured for rollback) stuck #1137

Bug Description

Describe the bug

To Reproduce

Scenario File

Config File

Expected behavior

Krkn Output

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Pod scenarios (not configured for rollback) stuck #1137

Description

Bug Description

Describe the bug

To Reproduce

Scenario File

Config File

Expected behavior

Krkn Output

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions