Skip to content

Commit f9298f4

Browse files
SimonDegrafKernJohannes HötterJWittmeyerlumburovskalina
authored
Adds password support for upload task (#137)
* Adds password support for upload task * Adds usage of password * Renames password to key * Renames password to key * Renames password to key * Non working version * Makes secret key owrking * Changes encoding steps * Updates submodule * Embedding providers (#136) * forward platform information * Adds agreement, gdpr comliant flsgs and embedding provider support * Adds created by to embeddings * Merges embedding logic for different embedding types and add support more embedding providers * Change order of embedding platforms * Added platform as part of the Encoder type * Adds modularization for recreation of embeddings * Refactors recreation of embeddings to solve dependency problems * Embedding recreation for adding new records * Fixes embedding deletion * Fixes small things * Adds notification for organization change * Adds better state management for reupload of records * Adds optimized state management for embeddings * Standardize call to recreate embeddings * Resolves PR comments * Removes print statement * Adds commit for embedding creation * Adds handling for missing tokenization * Adds logic to infer embedding information out of old projects * Adds order of project transfer so that source code can be replaced by new embedding name * Changes call logic of agreements * Resolves first few PR comments * Resolves PR comments * Adds new term text * Changed terms text slightly * Adds link and placeholder * Added link to embedding type * Update controller/transfer/project_transfer_manager.py Co-authored-by: JWittmeyer <[email protected]> * Resolves typo * Resolves typo * Submodules merge * Drone --------- Co-authored-by: Johannes Hötter <[email protected]> Co-authored-by: JWittmeyer <[email protected]> Co-authored-by: Lina <[email protected]> * Merges and adds key support for upload * Fixes key handling * Standardizes behavior how to export * Fixes export of projects * Removes print * Removes key of upload task after upload * Adds clean up logic for keys in upload tasks * Adds cleanup of files in tmp * Cleans file after project export * Clean up code * Embedding providers (#136) * forward platform information * Adds agreement, gdpr comliant flsgs and embedding provider support * Adds created by to embeddings * Merges embedding logic for different embedding types and add support more embedding providers * Change order of embedding platforms * Added platform as part of the Encoder type * Adds modularization for recreation of embeddings * Refactors recreation of embeddings to solve dependency problems * Embedding recreation for adding new records * Fixes embedding deletion * Fixes small things * Adds notification for organization change * Adds better state management for reupload of records * Adds optimized state management for embeddings * Standardize call to recreate embeddings * Resolves PR comments * Removes print statement * Adds commit for embedding creation * Adds handling for missing tokenization * Adds logic to infer embedding information out of old projects * Adds order of project transfer so that source code can be replaced by new embedding name * Changes call logic of agreements * Resolves first few PR comments * Resolves PR comments * Adds new term text * Changed terms text slightly * Adds link and placeholder * Added link to embedding type * Update controller/transfer/project_transfer_manager.py Co-authored-by: JWittmeyer <[email protected]> * Resolves typo * Resolves typo * Submodules merge * Drone --------- Co-authored-by: Johannes Hötter <[email protected]> Co-authored-by: JWittmeyer <[email protected]> Co-authored-by: Lina <[email protected]> * Update start * Update start * Adds check for secret key on startup * Sets warning on default secret key * Adds handling for bad password in file import * Rewrites additional apt packages * Removes print * Removes unnessecary global usage and formats code * Resolves PR comments * Resolves PR comments * Resolves PR comments * Submodules merge --------- Co-authored-by: Johannes Hötter <[email protected]> Co-authored-by: JWittmeyer <[email protected]> Co-authored-by: Lina <[email protected]>
1 parent a0e47e4 commit f9298f4

File tree

19 files changed

+291
-102
lines changed

19 files changed

+291
-102
lines changed

Dockerfile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@ FROM kernai/refinery-parent-images:v1.11.0-common
22

33
WORKDIR /app
44

5+
# used for encryption and zipping of files
6+
RUN apt-get update && apt-get install -y libc6-dev zlib1g gcc --no-install-recommends
7+
58
COPY requirements.txt .
69

710
RUN pip3 install --no-cache-dir -r requirements.txt
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
"""Adds key for upload
2+
3+
Revision ID: 217dbe11c5d1
4+
Revises: 1a25c862801f
5+
Create Date: 2023-06-21 14:55:16.523327
6+
7+
"""
8+
from alembic import op
9+
import sqlalchemy as sa
10+
11+
12+
# revision identifiers, used by Alembic.
13+
revision = '217dbe11c5d1'
14+
down_revision = '1a25c862801f'
15+
branch_labels = None
16+
depends_on = None
17+
18+
19+
def upgrade():
20+
# ### commands auto generated by Alembic - please adjust! ###
21+
op.add_column('upload_task', sa.Column('key', sa.LargeBinary(), nullable=True))
22+
# ### end Alembic commands ###
23+
24+
25+
def downgrade():
26+
# ### commands auto generated by Alembic - please adjust! ###
27+
op.drop_column('upload_task', 'key')
28+
# ### end Alembic commands ###

api/transfer.py

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,15 @@
11
import logging
22
import traceback
33
import time
4-
from typing import Any, List
4+
from typing import Optional
55

66
from controller import organization
7-
from controller.embedding import util as embedding_util
8-
from controller.embedding import connector as embedding_connector
97
from starlette.endpoints import HTTPEndpoint
108
from starlette.responses import PlainTextResponse, JSONResponse
119
from controller.embedding.manager import recreate_embeddings
1210

1311
from controller.transfer.labelstudio import import_preperator
12+
from exceptions.exceptions import BadPasswordError
1413
from submodules.s3 import controller as s3
1514
from submodules.model.business_objects import (
1615
attribute,
@@ -69,6 +68,14 @@ async def post(self, request) -> PlainTextResponse:
6968
is_global_update = True if task.file_type == "project" else False
7069
try:
7170
init_file_import(task, project_id, is_global_update)
71+
except BadPasswordError:
72+
file_import_error_handling(
73+
task,
74+
project_id,
75+
is_global_update,
76+
enums.NotificationType.BAD_PASSWORD_DURING_IMPORT,
77+
print_traceback=False,
78+
)
7279
except Exception:
7380
file_import_error_handling(task, project_id, is_global_update)
7481
notification.send_organization_update(
@@ -268,13 +275,19 @@ def init_file_import(task: UploadTask, project_id: str, is_global_update: bool)
268275

269276

270277
def file_import_error_handling(
271-
task: UploadTask, project_id: str, is_global_update: bool
278+
task: UploadTask,
279+
project_id: str,
280+
is_global_update: bool,
281+
notification_type: Optional[NotificationType] = None,
282+
print_traceback: bool = True,
272283
) -> None:
273284
general.rollback()
274285
task.state = enums.UploadStates.ERROR.value
275286
general.commit()
287+
if not notification_type:
288+
notification_type = NotificationType.IMPORT_FAILED
276289
create_notification(
277-
NotificationType.IMPORT_FAILED,
290+
notification_type,
278291
task.user_id,
279292
task.project_id,
280293
task.file_type,
@@ -284,13 +297,17 @@ def file_import_error_handling(
284297
task,
285298
)
286299
)
287-
print(traceback.format_exc(), flush=True)
300+
if print_traceback:
301+
print(traceback.format_exc(), flush=True)
302+
288303
notification.send_organization_update(
289304
project_id, f"file_upload:{str(task.id)}:state:{task.state}", is_global_update
290305
)
291306

292307

293-
def __recalculate_missing_attributes_and_embeddings(project_id: str, user_id: str) -> None:
308+
def __recalculate_missing_attributes_and_embeddings(
309+
project_id: str, user_id: str
310+
) -> None:
294311
__calculate_missing_attributes(project_id, user_id)
295312
recreate_embeddings(project_id)
296313

@@ -384,4 +401,3 @@ def __calculate_missing_attributes(project_id: str, user_id: str) -> None:
384401
message="calculate_attribute:finished:all",
385402
)
386403
general.remove_and_refresh_session(ctx_token, False)
387-

app.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
from graphql_api import schema
2222
from controller.task_queue.task_queue import init_task_queue
2323
from controller.project.manager import check_in_deletion_projects
24+
from util import security, clean_up
2425

2526

2627
logging.basicConfig(level=logging.DEBUG)
@@ -57,3 +58,6 @@
5758

5859
init_task_queue()
5960
check_in_deletion_projects()
61+
security.check_secret_key()
62+
clean_up.clean_up_database()
63+
clean_up.clean_up_disk()

controller/notification/notification_data.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,13 @@
121121
"page": enums.Pages.SETTINGS.value,
122122
"docs": enums.DOCS.UPLOADING_DATA.value,
123123
},
124+
enums.NotificationType.BAD_PASSWORD_DURING_IMPORT.value: {
125+
"message_template": "Bad password for zip file",
126+
"title": "Data import",
127+
"level": enums.Notification.ERROR.value,
128+
"page": enums.Pages.OVERVIEW.value,
129+
"docs": enums.DOCS.CREATING_PROJECTS.value,
130+
},
124131
enums.NotificationType.INFORMATION_SOURCE_STARTED.value: {
125132
"message_template": "Started heuristic @@arg@@.",
126133
"title": "Heuristic execution",

controller/transfer/manager.py

Lines changed: 28 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
import json
44
import traceback
55
from typing import Any, List, Optional, Dict
6-
import zipfile
76

87
from controller.transfer import export_parser
98
from controller.transfer.knowledge_base_transfer_manager import (
@@ -36,14 +35,15 @@
3635
from submodules.s3 import controller as s3
3736
import pandas as pd
3837
from datetime import datetime
39-
from util import notification
38+
from util import notification, security, file
4039
from sqlalchemy.sql import text as sql_text
4140
from controller.labeling_task import manager as labeling_task_manager
4241
from controller.labeling_task_label import manager as labeling_task_label_manager
4342
from submodules.model.business_objects import record_label_association as rla
4443
from controller.task_queue import manager as task_queue_manager
4544
from submodules.model.enums import TaskType, RecordTokenizationScope
4645

46+
4747
from util.notification import create_notification
4848

4949
logging.basicConfig(level=logging.DEBUG)
@@ -57,9 +57,17 @@ def get_upload_credentials_and_id(
5757
file_type: str,
5858
file_import_options: str,
5959
upload_type: str,
60+
key: Optional[str] = None,
6061
):
62+
key = security.encrypt(key)
6163
task = upload_task_manager.create_upload_task(
62-
str(user_id), project_id, file_name, file_type, file_import_options, upload_type
64+
str(user_id),
65+
project_id,
66+
file_name,
67+
file_type,
68+
file_import_options,
69+
upload_type,
70+
key,
6371
)
6472
org_id = organization.get_id_by_project_id(project_id)
6573
return s3.get_upload_credentials_and_id(org_id, project_id + "/" + str(task.id))
@@ -174,7 +182,10 @@ def export_records(
174182

175183

176184
def prepare_record_export(
177-
project_id: str, user_id: str, export_options: Optional[Dict[str, Any]] = None
185+
project_id: str,
186+
user_id: str,
187+
export_options: Optional[Dict[str, Any]] = None,
188+
key: Optional[str] = None,
178189
) -> None:
179190
records_by_options_query_data = get_records_by_options_query_data(
180191
project_id, export_options
@@ -187,7 +198,7 @@ def prepare_record_export(
187198
file_path, file_name = export_parser.parse(
188199
project_id, final_query, mapping_dict, extraction_appends, export_options
189200
)
190-
zip_path, file_name = __write_file_to_zip(file_path)
201+
zip_path, file_name = file.file_to_zip(file_path, key)
191202
org_id = organization.get_id_by_project_id(project_id)
192203
prefixed_path = f"{project_id}/download/{user_id}/record_export_"
193204
file_name_download = prefixed_path + file_name
@@ -233,7 +244,10 @@ def export_knowledge_base(project_id: str, base_id: str) -> str:
233244

234245

235246
def prepare_project_export(
236-
project_id: str, user_id: str, export_options: Dict[str, bool]
247+
project_id: str,
248+
user_id: str,
249+
export_options: Dict[str, bool],
250+
key: Optional[str] = None,
237251
) -> bool:
238252
org_id = organization.get_id_by_project_id(project_id)
239253
objects = s3.get_bucket_objects(org_id, project_id + "/download/project_export_")
@@ -242,36 +256,19 @@ def prepare_project_export(
242256

243257
data = get_project_export_dump(project_id, user_id, export_options)
244258
file_name_base = "project_export_" + datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
245-
file_name_local = file_name_base + ".zip"
246-
file_name_download = project_id + "/download/" + file_name_local
247-
__write_json_data_to_zip(data, file_name_base)
248-
s3.upload_object(org_id, file_name_download, file_name_local)
259+
json_file_path = file.text_to_json_file(data, file_name_base)
260+
zip_path, zip_name = file.file_to_zip(json_file_path, key)
261+
file_name_download = f"{project_id}/download/{zip_name}"
262+
s3.upload_object(org_id, file_name_download, zip_path)
249263
notification.send_organization_update(project_id, "project_export")
250264

251-
if os.path.exists(file_name_local):
252-
os.remove(file_name_local)
265+
if os.path.exists(json_file_path):
266+
os.remove(json_file_path)
267+
if os.path.exists(zip_path):
268+
os.remove(zip_path)
253269
return True
254270

255271

256-
def __write_file_to_zip(file_path: str) -> str:
257-
base_name = os.path.basename(file_path)
258-
file_name = base_name + ".zip"
259-
zip_path = f"{file_path}.zip"
260-
zipfile.ZipFile(zip_path, mode="w").write(file_path, base_name)
261-
return zip_path, file_name
262-
263-
264-
def __write_json_data_to_zip(dumped_json: str, base_file_name: str) -> None:
265-
with zipfile.ZipFile(
266-
base_file_name + ".zip",
267-
mode="w",
268-
compression=zipfile.ZIP_DEFLATED,
269-
compresslevel=9,
270-
) as zip_file:
271-
zip_file.writestr(base_file_name + ".json", data=dumped_json)
272-
zip_file.testzip()
273-
274-
275272
def last_project_export_credentials(project_id: str) -> str:
276273
return __get_last_export_credentials(project_id, "/download/project_export_")
277274

0 commit comments

Comments
 (0)