Skip to content

Commit b67a5d7

Browse files
lavalleealeskara904vmatsibekkerwilliamschen23bmcutler
authored
[Feature:InstructorUI] Add bulk upload redactions (#11539)
### Why is this Change Important & Necessary? <!-- Include any GitHub issue that is fixed/closed using "Fixes #<number>" or "Closes #<number>" syntax. Alternately write "Partially addresses #<number>" or "Related to #<number>" as appropriate. --> Images of test PDFs frequently contain a student's information, so now users have the option to redact certain parts of certain pages of tests. ### What is the New Behavior? <!-- Include before & after screenshots/videos if the user interface has changed. --> An instructor can upload redactions json files to specify areas of the bulk split pdf that should not be visible to graders. ### What steps should a reviewer take to reproduce or test the bug or new feature? To add redactions to a gradeable visit the update page on the rubric tab. The JSON can only be uploaded when " And do you expect each specific problem/item/component to appear on a specific page number of the PDF document? " is turned on ### Automated Testing & Documentation <!-- Is this feature sufficiently tested by unit tests and end-to-end tests? If this PR does not add/update the necessary automated tests, write a new GitHub issue and link it below. Is this feature sufficiently documented on submitty.org? Link related PRs or new GitHub issue to update documentation. --> ### Other information <!-- Is this a breaking change? Does this PR include migrations to update existing installations? Are there security concerns with this PR? --> See #11538 ### What is the current behavior? <!-- List issue if it fixes/closes/implements one using the "Fixes #<number>" or "Closes #<number>" syntax --> ### Other information? <!-- Is this a breaking change? --> <!-- How did you test --> Documentation in Submitty/submitty.github.io#682 This pull request introduces a new feature to handle redactions for gradeable submissions, along with updates to related database migrations, job processing, and API endpoints. The key changes include adding a `gradeable_redaction` table, implementing a redaction processing workflow, and modifying the PDF image generation process to apply redactions. ### Database Changes: * Added a new `gradeable_redaction` table to store redaction data, including constraints for valid coordinate ranges (`migration/migrator/data/course_tables.sql`). * Created a migration script to add the `gradeable_redaction` table and constraints (`migration/migrator/migrations/course/20250312145730_add_redactions.py`). ### Job Processing Updates: * Updated the `generate_pdf_images` job to accept redactions and output redacted images with a checkered pattern (`sbin/submitty_daemon_jobs/submitty_jobs/generate_pdf_images.py`). * Added a new `RegenerateBulkImages` job to regenerate images for all submissions in a bulk upload, applying redactions (`sbin/submitty_daemon_jobs/submitty_jobs/regenerate_bulk_images.py`). * Integrated the `RegenerateBulkImages` job into the job processing pipeline (`sbin/submitty_daemon_jobs/submitty_jobs/jobs.py`). ### API and Controller Enhancements: * Added endpoints to retrieve and update redactions for a gradeable in `AdminGradeableController`, including validation and triggering the regeneration job (`site/app/controllers/admin/AdminGradeableController.php`). * Updated the `SubmissionController` to include the `Redaction` model (`site/app/controllers/student/SubmissionController.php`). ### Test and Example Updates: * Removed assertions for page image generation in bulk PDF split tests, as this functionality is now handled by the redaction workflow (`sbin/submitty_daemon_jobs/tests/test_bulk_pdf_split.py`). * Added an example `redactions.json` file to demonstrate redaction data format (`more_autograding_examples/bulk_upload_pdfs/submissions/redactions.json`). These changes collectively enable the system to manage redactions effectively, ensuring sensitive information in gradeable submissions can be obscured as needed. --------- Co-authored-by: Sátvik Karanam <[email protected]> Co-authored-by: Viane Matsibekker <[email protected]> Co-authored-by: williamschen23 <[email protected]> Co-authored-by: Barb Cutler <[email protected]> Co-authored-by: Barb Cutler <Barb Cutler>
1 parent 166aadd commit b67a5d7

File tree

19 files changed

+571
-39
lines changed

19 files changed

+571
-39
lines changed

migration/migrator/data/course_tables.sql

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1388,6 +1388,45 @@ CREATE SEQUENCE public.gradeable_data_overall_comment_goc_id_seq
13881388
ALTER SEQUENCE public.gradeable_data_overall_comment_goc_id_seq OWNED BY public.gradeable_data_overall_comment.goc_id;
13891389

13901390

1391+
--
1392+
-- Name: gradeable_redaction; Type: TABLE; Schema: public; Owner: -
1393+
--
1394+
1395+
CREATE TABLE public.gradeable_redaction (
1396+
redaction_id integer NOT NULL,
1397+
g_id character varying(255) NOT NULL,
1398+
page integer NOT NULL,
1399+
x1 double precision NOT NULL,
1400+
x2 double precision NOT NULL,
1401+
y1 double precision NOT NULL,
1402+
y2 double precision NOT NULL,
1403+
CONSTRAINT x1_positive CHECK (((x1 >= (0)::double precision) AND (x1 <= x2))),
1404+
CONSTRAINT x2_positive CHECK (((x2 >= (0)::double precision) AND (x2 <= (1)::double precision))),
1405+
CONSTRAINT y1_positive CHECK (((y1 >= (0)::double precision) AND (y1 <= y2))),
1406+
CONSTRAINT y2_positive CHECK (((y2 >= (0)::double precision) AND (y2 <= (1)::double precision)))
1407+
);
1408+
1409+
1410+
--
1411+
-- Name: gradeable_redaction_redaction_id_seq; Type: SEQUENCE; Schema: public; Owner: -
1412+
--
1413+
1414+
CREATE SEQUENCE public.gradeable_redaction_redaction_id_seq
1415+
AS integer
1416+
START WITH 1
1417+
INCREMENT BY 1
1418+
NO MINVALUE
1419+
NO MAXVALUE
1420+
CACHE 1;
1421+
1422+
1423+
--
1424+
-- Name: gradeable_redaction_redaction_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: -
1425+
--
1426+
1427+
ALTER SEQUENCE public.gradeable_redaction_redaction_id_seq OWNED BY public.gradeable_redaction.redaction_id;
1428+
1429+
13911430
--
13921431
-- Name: gradeable_teams; Type: TABLE; Schema: public; Owner: -
13931432
--
@@ -2166,6 +2205,13 @@ ALTER TABLE ONLY public.gradeable_data ALTER COLUMN gd_id SET DEFAULT nextval('p
21662205
ALTER TABLE ONLY public.gradeable_data_overall_comment ALTER COLUMN goc_id SET DEFAULT nextval('public.gradeable_data_overall_comment_goc_id_seq'::regclass);
21672206

21682207

2208+
--
2209+
-- Name: gradeable_redaction redaction_id; Type: DEFAULT; Schema: public; Owner: -
2210+
--
2211+
2212+
ALTER TABLE ONLY public.gradeable_redaction ALTER COLUMN redaction_id SET DEFAULT nextval('public.gradeable_redaction_redaction_id_seq'::regclass);
2213+
2214+
21692215
--
21702216
-- Name: lichen id; Type: DEFAULT; Schema: public; Owner: -
21712217
--
@@ -2499,6 +2545,14 @@ ALTER TABLE ONLY public.gradeable
24992545
ADD CONSTRAINT gradeable_pkey PRIMARY KEY (g_id);
25002546

25012547

2548+
--
2549+
-- Name: gradeable_redaction gradeable_redaction_pkey; Type: CONSTRAINT; Schema: public; Owner: -
2550+
--
2551+
2552+
ALTER TABLE ONLY public.gradeable_redaction
2553+
ADD CONSTRAINT gradeable_redaction_pkey PRIMARY KEY (redaction_id);
2554+
2555+
25022556
--
25032557
-- Name: grade_inquiries gradeable_team_gc_id; Type: CONSTRAINT; Schema: public; Owner: -
25042558
--
@@ -3398,6 +3452,14 @@ ALTER TABLE ONLY public.gradeable_data_overall_comment
33983452
ADD CONSTRAINT gradeable_data_overall_comment_goc_user_id_fkey FOREIGN KEY (goc_user_id) REFERENCES public.users(user_id) ON DELETE CASCADE;
33993453

34003454

3455+
--
3456+
-- Name: gradeable_redaction gradeable_redaction_g_id_fkey; Type: FK CONSTRAINT; Schema: public; Owner: -
3457+
--
3458+
3459+
ALTER TABLE ONLY public.gradeable_redaction
3460+
ADD CONSTRAINT gradeable_redaction_g_id_fkey FOREIGN KEY (g_id) REFERENCES public.gradeable(g_id) ON DELETE CASCADE;
3461+
3462+
34013463
--
34023464
-- Name: gradeable_teams gradeable_teams_g_id_fkey; Type: FK CONSTRAINT; Schema: public; Owner: -
34033465
--
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
"""Migration for a given Submitty course database."""
2+
3+
4+
def up(config, database, semester, course):
5+
"""
6+
Run up migration.
7+
8+
:param config: Object holding configuration details about Submitty
9+
:type config: migrator.config.Config
10+
:param database: Object for interacting with given database for environment
11+
:type database: migrator.db.Database
12+
:param semester: Semester of the course being migrated
13+
:type semester: str
14+
:param course: Code of course being migrated
15+
:type course: str
16+
"""
17+
database.execute("""
18+
CREATE TABLE IF NOT EXISTS gradeable_redaction (
19+
redaction_id SERIAL PRIMARY KEY,
20+
g_id character varying(255) NOT NULL REFERENCES gradeable(g_id) ON DELETE CASCADE,
21+
page integer NOT NULL,
22+
x1 float NOT NULL CONSTRAINT x1_positive CHECK (x1 >= 0 AND x1 <= x2),
23+
x2 float NOT NULL CONSTRAINT x2_positive CHECK (x2 >= 0 AND x2 <= 1),
24+
y1 float NOT NULL CONSTRAINT y1_positive CHECK (y1 >= 0 AND y1 <= y2),
25+
y2 float NOT NULL CONSTRAINT y2_positive CHECK (y2 >= 0 AND y2 <= 1)
26+
)
27+
""")
28+
29+
30+
def down(config, database, semester, course):
31+
"""
32+
Run down migration (rollback).
33+
34+
:param config: Object holding configuration details about Submitty
35+
:type config: migrator.config.Config
36+
:param database: Object for interacting with given database for environment
37+
:type database: migrator.db.Database
38+
:param semester: Semester of the course being migrated
39+
:type semester: str
40+
:param course: Code of course being migrated
41+
:type course: str
42+
"""
43+
pass
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
[
2+
{
3+
"page": 2,
4+
"x1": 0,
5+
"y1": 0,
6+
"x2": 0.3,
7+
"y2": 0.3
8+
}
9+
]

sbin/submitty_daemon_jobs/submitty_jobs/bulk_qr_split.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
import numpy
88
from . import write_to_log as logger
99
from . import submitty_ocr as scanner
10-
from . import generate_pdf_images
1110

1211
# try importing required modules
1312
try:
@@ -103,14 +102,12 @@ def main(args):
103102
logger.write_to_json(json_file, output)
104103
with open(prev_file, 'wb') as out:
105104
pdf_writer.write(out)
106-
generate_pdf_images.main(prev_file, [])
107105

108106
if id_index == 1:
109107
# correct first pdf's page count and print file
110108
output[prev_file]['page_count'] = page_count
111109
with open(prev_file, 'wb') as out:
112110
pdf_writer.write(out)
113-
generate_pdf_images.main(prev_file, [])
114111

115112
# start a new pdf and grab the cover
116113
cover_writer = PdfWriter()
@@ -170,7 +167,6 @@ def main(args):
170167

171168
with open(prev_file, 'wb') as out:
172169
pdf_writer.write(out)
173-
generate_pdf_images.main(prev_file, [])
174170
# write the buffer to the log file, so everything is on one line
175171
logger.write_to_log(log_file_path, buff)
176172
except Exception:

sbin/submitty_daemon_jobs/submitty_jobs/bulk_upload_split.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
import traceback
88
from PyPDF2 import PdfWriter
99
from . import write_to_log as logger
10-
from . import generate_pdf_images
1110

1211
try:
1312
from pdf2image import convert_from_bytes
@@ -62,7 +61,6 @@ def main(args):
6261
i += 1
6362
with open(output_filename, 'wb') as out:
6463
pdf_writer.write(out)
65-
generate_pdf_images.main(output_filename, [])
6664

6765
with open(cover_filename, 'wb') as out:
6866
cover_writer.write(out)

sbin/submitty_daemon_jobs/submitty_jobs/generate_pdf_images.py

Lines changed: 39 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
from typing import List, Sequence
44

55
from pdf2image import convert_from_bytes
6-
from PIL import Image, ImageDraw
6+
from PIL import ImageDraw
77
from PyPDF2 import PdfReader
88

99

@@ -13,28 +13,54 @@ def __init__(self, page_number: int, coordinates: Sequence[float]):
1313
self.coordinates = coordinates
1414

1515

16-
def main(pdf_file_path: str, redactions: List[Redaction]):
16+
def main(pdf_file_path: str, output_dir: str, redactions: List[Redaction]):
1717
directory = os.path.dirname(pdf_file_path)
1818
if directory:
1919
os.chdir(os.path.dirname(pdf_file_path))
20+
# Ensure the output directory exists
21+
if not os.path.exists(output_dir):
22+
os.makedirs(output_dir)
2023
try:
2124
pdfPages = PdfReader(pdf_file_path, strict=False)
22-
with open(pdf_file_path, 'rb') as open_file:
25+
with open(pdf_file_path, "rb") as open_file:
2326
imagePages = convert_from_bytes(
2427
open_file.read(),
25-
)
28+
)
29+
# Loop through each page in the PDF and save it as an image
2630
for page_number in range(len(pdfPages.pages)):
27-
image_filename = pdf_file_path[:-4] + '_' + str(page_number + 1).zfill(3) + '.jpg'
28-
imagePages[page_number].save(image_filename,
29-
"JPEG", quality=20, optimize=True)
31+
image_filename = os.path.join(
32+
output_dir,
33+
"."
34+
+ os.path.basename(pdf_file_path[:-4])
35+
+ "_page_"
36+
+ str(page_number + 1).zfill(2)
37+
+ ".jpg",
38+
)
39+
img = imagePages[page_number]
40+
draw = ImageDraw.Draw(img)
3041
for redaction in redactions:
31-
if redaction.page_number != page_number:
42+
# Add 1 to page_number because redactions are 1-indexed
43+
# and page_number is 0-indexed
44+
if redaction.page_number != page_number + 1:
3245
continue
33-
img = Image.open(image_filename)
34-
draw = ImageDraw.Draw(img)
35-
draw.rectangle(redaction.coordinates, fill="black")
36-
img.save(image_filename,
37-
"JPEG", quality=20, optimize=True)
46+
square_size = 25
47+
48+
# Convert coordinates from relative to absolute pixel values
49+
x0 = int(redaction.coordinates[0] * img.size[0])
50+
y0 = int(redaction.coordinates[1] * img.size[1])
51+
x1 = int(redaction.coordinates[2] * img.size[0])
52+
y1 = int(redaction.coordinates[3] * img.size[1])
53+
54+
# Create a grid of black and grey squares within the redaction area
55+
# Loops ensure that the checkered pattern is created
56+
for y in range(y0, y1, square_size):
57+
for x in range(x0, x1, square_size):
58+
fill_color = "black" if ((x // square_size + y // square_size) % 2 == 0) else "grey"
59+
draw.rectangle(
60+
[x, y, x + square_size, y + square_size], fill=fill_color
61+
)
62+
print(f"Saving image {image_filename}")
63+
img.save(image_filename, "JPEG", quality=20, optimize=True)
3864
except Exception:
3965
msg = "Failed when splitting pdf " + pdf_file_path
4066
print(msg)

sbin/submitty_daemon_jobs/submitty_jobs/jobs.py

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@
1616
import requests
1717
from urllib.parse import unquote
1818
from tempfile import TemporaryDirectory
19+
20+
from . import regenerate_bulk_images
1921
from . import bulk_qr_split
2022
from . import bulk_upload_split
2123
from . import generate_pdf_images
@@ -351,10 +353,15 @@ def run_job(self):
351353

352354
class GeneratePdfImages(AbstractJob):
353355
def run_job(self):
354-
pdf_file_path = self.job_details['pdf_file_path']
356+
pdf_file_path = self.job_details["pdf_file_path"]
357+
output_dir = self.job_details["output_dir"]
355358
# optionally get redactions
356-
redactions = self.job_details.get('redactions', [])
357-
generate_pdf_images.main(pdf_file_path, [generate_pdf_images.Redaction(**r) for r in redactions])
359+
redactions = self.job_details.get("redactions", [])
360+
generate_pdf_images.main(
361+
pdf_file_path,
362+
output_dir,
363+
[generate_pdf_images.Redaction(**r) for r in redactions],
364+
)
358365

359366
def cleanup_job(self):
360367
pass
@@ -435,6 +442,20 @@ def cleanup_job(self):
435442
pass
436443

437444

445+
# Used to regenerate images for all submissions in a bulk upload
446+
class RegenerateBulkImages(AbstractJob):
447+
def run_job(self):
448+
folder = self.job_details["pdf_file_path"]
449+
redactions = [
450+
generate_pdf_images.Redaction(**r)
451+
for r in self.job_details.get("redactions", [])
452+
]
453+
regenerate_bulk_images.main(folder, redactions)
454+
455+
def cleanup_job(self):
456+
pass
457+
458+
438459
class DocxToPDF(AbstractJob):
439460
def run_job(self):
440461
log_dir = os.path.join(DATA_DIR, "logs", "docx_to_pdf")
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
import json
2+
from pathlib import Path
3+
4+
from . import generate_pdf_images
5+
6+
7+
# Regenerate images for all submissions in a bulk upload
8+
def main(folder, redactions):
9+
# Convert folder to Path object
10+
folder_path = Path(folder)
11+
12+
# loop over all submitters in folder and regrade their active version
13+
for submitter_dir in [d for d in folder_path.iterdir() if d.is_dir()]:
14+
# Read user_assignment_settings.json to get the active version
15+
settings_path = submitter_dir / "user_assignment_settings.json"
16+
17+
with open(settings_path, "r") as f:
18+
settings = json.load(f)
19+
active_version = settings.get("active_version", None)
20+
21+
if active_version is None:
22+
continue
23+
24+
active_version_path = submitter_dir / str(active_version)
25+
# Check if the active version is a directory
26+
if not active_version_path.is_dir():
27+
continue
28+
# Run the generate_pdf_images job on the active version
29+
pdf_path = active_version_path / "upload.pdf"
30+
results_path = str(active_version_path).replace("submissions", "submissions_processed")
31+
generate_pdf_images.main(
32+
str(pdf_path),
33+
results_path,
34+
redactions,
35+
)

sbin/submitty_daemon_jobs/tests/test_bulk_pdf_split.py

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -46,11 +46,6 @@ def test_split_pdf(self):
4646
cover_tgt = Path(file_name + '_' + str(i_idx).zfill(2) + '_cover.pdf')
4747
self.assertTrue(split_tgt.is_file())
4848

49-
#verify each page png is being produced
50-
for j_idx in range(1,tgt_num_pages+1):
51-
page_tgt = Path(file_name + '_' + str(i_idx).zfill(2) + '_' + str(j_idx).zfill(3) + '.jpg')
52-
self.assertTrue(page_tgt.is_file())
53-
5449

5550
#Test handling a bad number of given pages to split a pdf gracefully
5651
def test_bad_split_number(self):
@@ -115,11 +110,6 @@ def test_split_qr(self):
115110
cover_tgt = Path(file_name + '_' + str(i_idx).zfill(3) + '_cover.pdf')
116111
self.assertTrue(split_tgt.is_file())
117112

118-
#verify each page png is being produced
119-
for j_idx in range(1,tgt_num_pages+1):
120-
page_tgt = Path(file_name + '_' + str(i_idx).zfill(3) + '_' + str(j_idx).zfill(3) + '.jpg')
121-
self.assertTrue(page_tgt.is_file())
122-
123113

124114

125115
def test_split_qr_url(self):

0 commit comments

Comments
 (0)