Skip to content

Commit 80371c7

Browse files
authored
Chore: refactor autograding classes (#305)
* refactor: Start refactoring the LocalAutogradeExecutor class Split methods into smaller ones, improve logging and error handling * refactor: [WIP] Prepare GenerateFeedbackExecutor for refactoring Reflect some of the changes from the base class, LocalAutogradeExecutor, and improve error handling etc. * fix: Fix a circular import * chore: Make all helper methods in kube grader 'private' (for consistency) * cleanup: Remove redundant recreation of output dirs in graders * refactor: [WIP] Extract git-related logic from LocalAutogradeExecutor and derrived classes Also: - move getting all whitelist patterns to a method in Assignment (seems to fit there better), - remove some duplicated code from the child classes of the AutogradeExecutor, sometimes making their behaviour more consistent with the parent, - add missing type annotations and improve a few clearly outdated docstrings, etc. * tests: Add tests for the LocalAutogradeExecutor * refactor: Remove explicit gradebook file deletion (is done in _cleanup) * fix: Fix writing gradebook in GenerateFeedbackProcessExecutor * fix: Remove unused attributes from GenerateFeedbackExecutor * test: Write tests for local feedback generation executors * fix: Use the full commit hash when creating a submission in tests setup * fix: Fix reading stderr in the process executors process.stderr is a str already, no need to read().decode() it - it fails * fix: Remove unexpected (and unused) param from GenerateFeedbackApp.start() * cleanup: Clean up fixtures for autograding tests * fix: Re-add missing updating of submission logs; log process errors more consistently * tests: Add tests for LocalProcessAutogradeExecutor, improve tests for local feedback * refactor: Simplify autograding/feedback log collection * refactor: Improve running subprocesses, adjust and add tests * refactor: Pass git commands to subprocess.run as lists, not strings We execute the git commands with variables potentially provided by users (filenames, usernames, etc.). Passing them as strings and then splitting, potentially could enable constructing these variables in such a way so as to execute malicious commands. An alternative would be to use `shlex.quote` with any variables, but it seems more cumbersome. * refactor: Small tweaks in the local grader and local feedback, update docstrings * tests: Add tests for the collect_logs context manager * fix: Feedback repo path, do not save feedback logs in submission.logs Until now feedback logs have also not been appended to submission.logs. * chore: Small improvements in tests * fix: Only commit html files when generating feedback Previously, all student-supplied files were included in the feedback, even though feedback is actually only generated for the jupyter notebooks that are included in the gradebook. Now only the generated html files will be included in the feedback files. * fix: Protect against fabricated usernames in git repo paths in autograders * refactor: Move git manager to a separate module * chore: Small cleanup in autograde executors * chore: Make autograding/feedback executor class names more consistent * fix: Do not write `self.grading_logs` in local feedback executors We don't use these logs anywhere, so it's not necessary to collect them. All the logs are still logged with the regular `self.log` logger.
1 parent d845077 commit 80371c7

File tree

16 files changed

+1005
-435
lines changed

16 files changed

+1005
-435
lines changed

docs/source/admin/system_architecture.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,11 +114,10 @@ and handle the evaluation of user submissions, such as running notebooks for gra
114114
Auto-grading is accomplished using the `grader_service.convert` submodule, which can be executed as a command-line interface (CLI).
115115
Different executors are available to manage this:
116116
- **LocalAutogradeExecutor**: Executes the module directly within the current Python process on the worker by importing the package and invoking the converters.
117-
- **LocalProcessAutogradeExecutor**: Runs the submodule in a separate process.
117+
- **LocalAutogradeProcessExecutor**: Runs the submodule in a separate process.
118118
- **KubeAutogradeExecutor**: Spawns a Kubernetes pod to run the submodule. This is the only approach that allows different images for lectures, as the grading code must be executed in the same environment as the lecture.
119119

120120

121121
# How To Scale
122122

123123
[//]: # (TODO: what is the minimal setup? what is the most sophisticated setup)
124-

grader_service/autograding/celery/tasks.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
from tornado.web import HTTPError
66

77
from grader_service.autograding.celery.app import CeleryApp
8-
from grader_service.autograding.local_feedback import GenerateFeedbackExecutor
8+
from grader_service.autograding.local_feedback import LocalFeedbackExecutor
99
from grader_service.handlers.base_handler import RequestHandlerConfig
1010
from grader_service.orm.submission import FeedbackStatus, Submission
1111
from grader_service.plugins.lti import LTISyncGrades
@@ -78,7 +78,7 @@ def generate_feedback_task(self: GraderTask, lecture_id: int, assignment_id: int
7878
f"invalid submission {submission.id}: {assignment_id=:}, {lecture_id=:} does not match"
7979
)
8080

81-
executor = GenerateFeedbackExecutor(grader_service_dir, submission, config=self.celery.config)
81+
executor = LocalFeedbackExecutor(grader_service_dir, submission, config=self.celery.config)
8282
executor.start()
8383
if submission.feedback_status == FeedbackStatus.GENERATED:
8484
self.log.info("Successfully generated feedback for submission %s!", submission.id)
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
import os
2+
import subprocess
3+
from typing import Any, List, Optional
4+
5+
from traitlets import Unicode, validate
6+
from traitlets.config import LoggingConfigurable
7+
8+
from grader_service.autograding.utils import executable_validator
9+
from grader_service.handlers.handler_utils import GitRepoType
10+
from grader_service.orm import Assignment, Lecture, Submission
11+
12+
13+
class GitSubmissionManager(LoggingConfigurable):
14+
"""
15+
Handles git-related operations performed by autograder executors:
16+
pulling from an input repo, and committing and pushing to the output repo.
17+
"""
18+
19+
git_executable = Unicode("git", allow_none=False).tag(config=True)
20+
input_repo_type = GitRepoType.USER
21+
output_repo_type = GitRepoType.AUTOGRADE
22+
23+
def __init__(self, grader_service_dir: str, submission: Submission, **kwargs: Any):
24+
super().__init__(**kwargs)
25+
self.grader_service_dir = grader_service_dir
26+
self.submission = submission
27+
28+
if self.input_repo_type == GitRepoType.USER and self.submission.edited:
29+
# User's submission was edited by the instructor - repo type has to be adjusted
30+
self.input_repo_type = GitRepoType.EDIT
31+
32+
self.input_branch = "main"
33+
self.output_branch = f"submission_{self.submission.commit_hash}"
34+
35+
def _get_repo_path(self, repo_type: GitRepoType) -> str:
36+
"""Determines the Git repository path for the submission."""
37+
assignment: Assignment = self.submission.assignment
38+
lecture: Lecture = assignment.lecture
39+
repo_name = self.submission.user.name
40+
41+
base_repo_path = os.path.join(
42+
self.grader_service_dir, "git", lecture.code, str(assignment.id), repo_type
43+
)
44+
if repo_type in [GitRepoType.AUTOGRADE, GitRepoType.FEEDBACK]:
45+
path = os.path.join(base_repo_path, "user", repo_name)
46+
elif repo_type == GitRepoType.EDIT:
47+
path = os.path.join(base_repo_path, str(self.submission.id))
48+
elif repo_type == GitRepoType.USER:
49+
path = os.path.join(base_repo_path, repo_name)
50+
else:
51+
raise ValueError(f"Cannot determine repo path for repo type {repo_type}")
52+
53+
path = os.path.normpath(path)
54+
55+
if not path.startswith(self.grader_service_dir):
56+
self.log.error(
57+
f"Invalid repo path: {path}. Possibly suspicious values: "
58+
f"lecture code: '{lecture.code}' or user name: '{repo_name}'"
59+
)
60+
raise PermissionError("Invalid repository path.")
61+
62+
return path
63+
64+
def pull_submission(self, input_path: str) -> None:
65+
"""Inits and pulls the submission repository into the input path.
66+
67+
:param input_path: The directory where the input repo will be created.
68+
"""
69+
input_repo_path = self._get_repo_path(self.input_repo_type)
70+
71+
self.log.info(f"Pulling repo {input_repo_path} into input directory")
72+
commands = [
73+
[self.git_executable, "init"],
74+
[self.git_executable, "pull", input_repo_path, self.input_branch],
75+
]
76+
77+
# When autograding a user's submission, check out to the commit of submission
78+
if self.input_repo_type == GitRepoType.USER:
79+
commands.append([self.git_executable, "checkout", self.submission.commit_hash])
80+
81+
for cmd in commands:
82+
self._run_git(cmd, input_path)
83+
84+
self.log.info("Successfully cloned repo.")
85+
86+
def _set_up_output_repo(self, output_path: str) -> None:
87+
"""Initializes the output repo and switches to a separate branch named
88+
after the commit hash of the submission."""
89+
output_repo_path = self._get_repo_path(self.output_repo_type)
90+
91+
if not os.path.exists(output_repo_path):
92+
os.makedirs(output_repo_path)
93+
self._run_git([self.git_executable, "init", "--bare", output_repo_path], output_path)
94+
95+
self.log.info(f"Initialising repo at {output_path}")
96+
self._run_git([self.git_executable, "init"], output_path)
97+
self.log.info(f"Creating the new branch {self.output_branch} and switching to it")
98+
command = [self.git_executable, "switch", "-c", self.output_branch]
99+
self._run_git(command, output_path)
100+
self.log.info(f"Now at branch {self.output_branch}")
101+
102+
def _commit_files(self, filenames: List[str], output_path: str) -> None:
103+
"""
104+
Commits the provided files in the repo at `output_path`.
105+
"""
106+
self.log.info(f"Committing files in {output_path}")
107+
108+
if not filenames:
109+
self.log.info("No files to commit.")
110+
return
111+
112+
self._run_git([self.git_executable, "add", "--", *filenames], output_path)
113+
self._run_git(
114+
[self.git_executable, "commit", "-m", self.submission.commit_hash], output_path
115+
)
116+
117+
def push_results(self, filenames: List[str], output_path: str) -> None:
118+
"""Creates the output repository, commits and pushes the changes."""
119+
self._set_up_output_repo(output_path)
120+
self._commit_files(filenames, output_path)
121+
122+
output_repo_path = self._get_repo_path(self.output_repo_type)
123+
124+
self.log.info(f"Pushing to {output_repo_path} at branch {self.output_branch}")
125+
self._run_git(
126+
[self.git_executable, "push", "-uf", output_repo_path, self.output_branch], output_path
127+
)
128+
self.log.info("Pushing complete")
129+
130+
def _run_git(self, command: list[str], cwd: Optional[str]) -> None:
131+
"""
132+
Execute a git command as a subprocess.
133+
134+
Args:
135+
command: The git command to execute, as a list of strings.
136+
cwd: The working directory the subprocess should run in.
137+
Raises:
138+
`subprocess.CalledProcessError`: if `subprocess.run` fails.
139+
Any other exception thrown while running the subprocess is logged and also re-raised.
140+
"""
141+
assert command[0] == self.git_executable, f"Not a git command: {command}"
142+
self.log.debug('Running "%s"', " ".join(command))
143+
try:
144+
subprocess.run(
145+
command,
146+
stdout=subprocess.PIPE,
147+
stderr=subprocess.PIPE,
148+
cwd=cwd,
149+
text=True, # Decodes output to string
150+
check=True, # Raises a CalledProcessError on non-zero exit code
151+
)
152+
except subprocess.CalledProcessError as e:
153+
self.log.error(e.stderr)
154+
raise
155+
except Exception as e:
156+
self.log.error(e)
157+
raise
158+
159+
# TODO: Can I decorate executable_validator with both "git_executable" and "convert_executable"?
160+
@validate("git_executable")
161+
def _validate_executable(self, proposal):
162+
return executable_validator(proposal)

grader_service/autograding/kube/kube_grader.py

Lines changed: 11 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
import asyncio
88
import inspect
99
import json
10-
import os
1110
import re
1211
import time
1312
from asyncio import Task, run
@@ -20,7 +19,6 @@
2019

2120
from grader_service.autograding.kube.util import get_current_namespace, make_pod
2221
from grader_service.autograding.local_grader import LocalAutogradeExecutor
23-
from grader_service.autograding.utils import rmtree
2422
from grader_service.orm import Assignment, Lecture, Submission
2523
from grader_service.orm.assignment import json_serial
2624

@@ -282,7 +280,7 @@ def __init__(self, grader_service_dir: str, submission: Submission, **kwargs):
282280
self.log.info(f"Setting Namespace for submission {self.submission.id}")
283281
self.namespace = get_current_namespace()
284282

285-
def get_image(self) -> str:
283+
def _get_image(self) -> str:
286284
"""
287285
Returns the image name based on the lecture and assignment.
288286
If an image config file exists and has
@@ -312,7 +310,7 @@ def get_image(self) -> str:
312310
else:
313311
return self.resolve_image_name(self.lecture, self.assignment)
314312

315-
def get_autograde_pod_name(self) -> str:
313+
def _get_autograde_pod_name(self) -> str:
316314
# sanitize username by converting to lowercase and replacing non-alphanumeric chars
317315
sanitized_username = re.sub(r"[^a-zA-Z0-9]+", "-", self.submission.user.name.lower())
318316

@@ -325,7 +323,7 @@ def get_autograde_pod_name(self) -> str:
325323

326324
return f"autograde-job-{sanitized_username}-{self.submission.id}"
327325

328-
def create_env(self) -> list[V1EnvVar]:
326+
def _create_env(self) -> list[V1EnvVar]:
329327
env = [
330328
V1EnvVar(
331329
name="ASSIGNMENT_SETTINGS",
@@ -334,7 +332,7 @@ def create_env(self) -> list[V1EnvVar]:
334332
]
335333
return env
336334

337-
def start_pod(self) -> GraderPod:
335+
def _start_pod(self) -> GraderPod:
338336
"""
339337
Starts a pod in the namespace
340338
with the commit hash as the name of the pod.
@@ -369,14 +367,14 @@ def start_pod(self) -> GraderPod:
369367
]
370368
volume_mounts = volume_mounts + self.extra_volume_mounts
371369

372-
env = self.create_env()
370+
env = self._create_env()
373371

374372
# create pod spec
375373
pod = make_pod(
376-
name=self.get_autograde_pod_name(),
374+
name=self._get_autograde_pod_name(),
377375
cmd=command,
378376
env=env,
379-
image=self.get_image(),
377+
image=self._get_image(),
380378
image_pull_policy=self.image_pull_policy,
381379
image_pull_secrets=self.image_pull_secrets,
382380
working_dir="/",
@@ -406,13 +404,9 @@ def _run(self):
406404
input and output directory through a persistent volume claim.
407405
:return: Coroutine
408406
"""
409-
if os.path.exists(self.output_path):
410-
rmtree(self.output_path)
411-
os.makedirs(self.output_path, exist_ok=True)
412-
self._write_gradebook(self._put_grades_in_assignment_properties())
413407
grader_pod = None
414408
try:
415-
grader_pod = self.start_pod()
409+
grader_pod = self._start_pod()
416410
self.log.info(f"Started pod {grader_pod.name} in namespace {grader_pod.namespace}")
417411
status = grader_pod.poll()
418412
self.grading_logs = self._get_pod_logs(grader_pod)
@@ -429,8 +423,9 @@ def _run(self):
429423
error_message = json.loads(e.body)
430424
if error_message["reason"] != "AlreadyExists" and grader_pod is not None:
431425
try:
432-
namespace = grader_pod.namespace
433-
self.client.delete_namespaced_pod(name=grader_pod.name, namespace=namespace)
426+
self.client.delete_namespaced_pod(
427+
name=grader_pod.name, namespace=grader_pod.namespace
428+
)
434429
except ApiException:
435430
pass
436431
self.log.error(f"{error_message['reason']}: {error_message['message']}")

0 commit comments

Comments
 (0)