-
Notifications
You must be signed in to change notification settings - Fork 472
[feat] Init an http eval server and entrypoints for lmms_eval #972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
830a68d to
b19c739
Compare
- Add CANCELLED status to JobStatus enum - Mark jobs as CANCELLED instead of deleting (fixes race condition) - Worker now skips CANCELLED jobs - Remove compare_results methods (no server endpoint exists) - Remove unused launcher_args parameter Github-Issue: #972
…the function inside the server state
…nto the lifespan and thus we can extend to more state in the future
Fix multiple issues identified during PR review for the HTTP server entrypoints module. Type Safety (client.py): - Add proper type annotations to __exit__ and __aexit__ methods with explicit exc_type, exc_val, exc_tb parameters instead of *args - Add return type annotations to __enter__ and __aenter__ methods - Add ExcInfo type alias for context manager exception info Code Deduplication (client.py): - Extract _process_job_status() helper function to eliminate duplicate job status handling logic between EvalClient.wait_for_job() and AsyncEvalClient.wait_for_job() Memory Management (job_scheduler.py): - Add cleanup_old_jobs() method to prevent unbounded growth of _jobs dictionary which could cause memory leaks in long-running servers - Cleanup runs automatically after each job completes/fails - Add max_completed_jobs constructor parameter (default: 100) to configure retention limit for finished jobs Configuration (job_scheduler.py): - Add temp_dir_prefix constructor parameter to make temporary directory prefix configurable instead of hardcoded "lmms_eval_" - Add class constants DEFAULT_MAX_COMPLETED_JOBS and DEFAULT_TEMP_DIR_PREFIX for better discoverability Error Messages (server_args.py): - Improve port validation error to show both the invalid value and its type for easier debugging (e.g., "got '8080' (type: str)") - Add docstring to __post_init__ method
Fix multiple issues identified during PR review for the HTTP server entrypoints module. Type Safety (client.py): - Add proper type annotations to __exit__ and __aexit__ methods with explicit exc_type, exc_val, exc_tb parameters instead of *args - Add return type annotations to __enter__ and __aenter__ methods - Add ExcInfo type alias for context manager exception info Code Deduplication (client.py): - Extract _process_job_status() helper function to eliminate duplicate job status handling logic between EvalClient.wait_for_job() and AsyncEvalClient.wait_for_job() Memory Management (job_scheduler.py): - Add cleanup_old_jobs() method to prevent unbounded growth of _jobs dictionary which could cause memory leaks in long-running servers - Cleanup runs automatically after each job completes/fails - Add max_completed_jobs constructor parameter (default: 100) to configure retention limit for finished jobs Configuration (job_scheduler.py): - Add temp_dir_prefix constructor parameter to make temporary directory prefix configurable instead of hardcoded "lmms_eval_" - Add class constants DEFAULT_MAX_COMPLETED_JOBS and DEFAULT_TEMP_DIR_PREFIX for better discoverability Error Messages (server_args.py): - Improve port validation error to show both the invalid value and its type for easier debugging (e.g., "got '8080' (type: str)") - Add docstring to __post_init__ method
|
Claude finished @Luodian's task in 2m 17s —— View job Code Review Complete ✅I've completed a thorough review of this PR introducing the HTTP eval server for lmms-eval. Overall, this is a well-structured implementation with good architecture. Below are my findings organized by severity. 🔴 Critical Issues1. Security: Unauthenticated Server Exposed (http_server.py:27-32)The server has no authentication or authorization. While there's a warning comment, this is a significant security risk. Issue: Any user with network access can:
Recommendation: Add at minimum:
2. Missing Type Hints on Public APIs (CLAUDE.md violation)Several functions lack complete type hints, violating the project's code quality standards:
Required: Per CLAUDE.md line 27-28, "Type hints required for all code" and "Public APIs must have docstrings" 🟡 High Priority Issues3. Race Condition in Job Cancellation (job_scheduler.py:214-241)The Current flow: async def cancel_job(self, job_id: str) -> tuple[bool, str]:
async with self._jobs_lock:
job = self._jobs.get(job_id)
if job.status == JobStatus.RUNNING: # Could change after this check
return False, "Cannot cancel a running job"
job.status = JobStatus.CANCELLEDIssue: The Fix: Check for 4. Subprocess Evaluation Blocks Event Loop (job_scheduler.py:371-443)The while True:
line = await proc.stdout.readline() # Blocks event loop per lineIssue: This prevents other async operations from running smoothly during long evaluations. Recommendation: Use 5. No Resource Cleanup on Server Shutdown (http_server.py:57, job_scheduler.py:146-154)When the server shuts down with jobs in the queue:
Fix: In async def stop(self):
# Cancel all queued jobs
async with self._jobs_lock:
for job_id, job in self._jobs.items():
if job.status == JobStatus.QUEUED:
job.status = JobStatus.CANCELLED
# Then stop worker
if self._worker_task:
self._worker_task.cancel()6. Temp Directory Leak (job_scheduler.py:378)Temporary directories created by output_path = config.get("output_dir") or tempfile.mkdtemp(prefix=self._temp_dir_prefix)Fix: Track created temp dirs and clean them up, or use 7. Missing Docstrings on Public APIs (CLAUDE.md violation)Per CLAUDE.md line 28, "Public APIs must have docstrings". Missing:
🟢 Medium Priority Issues8. Inconsistent Error Handling (client.py:106-111)The except Exception as e:
return TestResult(success=False, error=str(e))Recommendation: Define custom exception types for clearer error handling. 9. Line Length Violations (CLAUDE.md:31)Project standard is 88 chars max. Found violations:
Fix: Break long lines per CLAUDE.md guidelines. 10. Hardcoded Constants (job_scheduler.py:117-118)Magic numbers should be configurable: DEFAULT_MAX_COMPLETED_JOBS = 100
DEFAULT_TEMP_DIR_PREFIX = "lmms_eval_"Recommendation: Make these part of 11. No Logging of Job Lifecycle Events (job_scheduler.py)Job state transitions (queued → running → completed) are not logged, making debugging difficult. Fix: Add logger calls in async def _start_job(self, job_id: str) -> Optional[dict]:
async with self._jobs_lock:
job = self._jobs.get(job_id)
if job is None or job.status == JobStatus.CANCELLED:
return None
logger.info(f"Starting job {job_id}") # ADD THIS
self._current_job_id = job_id
job.status = JobStatus.RUNNING
job.started_at = datetime.now().isoformat()
return job.request.model_dump()12. Test Uses Deprecated
|
|
For security issue, we ignore this for now. Currently we assume the server is for internal machine deployment only. Might need extra middle layer to handle these if it needs to be an actual service. |
Before you open a pull-request, please check if a similar issue already exists or has been closed before.
When you open a pull-request, please be sure to include the following
If you meet the lint warnings, you can use following scripts to reformat code.
Ask for review
Once you feel comfortable for your PR, feel free to @ one of the contributors to review
Eval Server: @Luodian @kcz358 @mwxely
Summary
Introduces an HTTP server for running lmms-eval evaluations via REST API. This enables programmatic evaluation submission and job management.
Features
What to do with future plans:
Launch Server
Use Case
Output (Client Side)