Skip to content

Commit 9669238

Browse files
authored
Initial implementation of the attribute calculation execution environment (#1)
* adds README and LICENSE * adds Dockerfile, build script, drone files, requirements and hello world * implements running of attribute calculation * adds logs * adds beautiful soup to requirements
1 parent 356ed38 commit 9669238

File tree

10 files changed

+302
-2
lines changed

10 files changed

+302
-2
lines changed

.drone.yml

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
kind: pipeline
2+
type: docker
3+
name: amd64
4+
5+
platform:
6+
arch: amd64
7+
8+
steps:
9+
- name: build and publish
10+
image: plugins/docker
11+
settings:
12+
registry: registry.dev.onetask.ai
13+
username:
14+
from_secret: docker_username
15+
password:
16+
from_secret: docker_password
17+
repo: "registry.dev.onetask.ai/${DRONE_REPO}"
18+
tags: ["${DRONE_COMMIT_SHA}", "${DRONE_COMMIT_BRANCH}"]
19+
cache_from:
20+
- "registry.dev.onetask.ai/${DRONE_REPO}:dev"
21+
- "registry.dev.onetask.ai/${DRONE_REPO}:${DRONE_COMMIT_BRANCH}"
22+
23+
trigger:
24+
event:
25+
- push
26+
27+
---
28+
kind: pipeline
29+
type: docker
30+
name: trigger update
31+
32+
platform:
33+
arch: amd64
34+
35+
steps:
36+
- name: trigger update
37+
image: appleboy/drone-ssh
38+
settings:
39+
host: app.dev.onetask.ai
40+
username:
41+
from_secret: ssh_user
42+
key:
43+
from_secret: ssh_key
44+
ssh_passphrase:
45+
from_secret: ssh_passphrase
46+
script:
47+
- /bin/sh ./trigger_dev_deployment.sh
48+
49+
depends_on:
50+
- amd64
51+
52+
trigger:
53+
branch:
54+
- dev
55+
event:
56+
- push
57+
58+
---
59+
kind: pipeline
60+
type: docker
61+
name: arm64
62+
63+
platform:
64+
arch: arm64
65+
66+
steps:
67+
- name: build and publish
68+
image: plugins/docker
69+
settings:
70+
registry: registry.dev.onetask.ai
71+
username:
72+
from_secret: docker_username
73+
password:
74+
from_secret: docker_password
75+
repo: "registry.dev.onetask.ai/${DRONE_REPO}"
76+
tags: ["${DRONE_COMMIT_SHA}_arm64", "${DRONE_COMMIT_BRANCH}_arm64"]
77+
cache_from:
78+
- "registry.dev.onetask.ai/${DRONE_REPO}:dev_arm64"
79+
- "registry.dev.onetask.ai/${DRONE_REPO}:${DRONE_COMMIT_BRANCH}_arm64"
80+
81+
trigger:
82+
branch:
83+
- dev
84+
event:
85+
- push
86+
87+
---
88+
kind: pipeline
89+
type: docker
90+
name: amd64-dockerhub
91+
92+
platform:
93+
arch: amd64
94+
95+
steps:
96+
- name: build and publish
97+
image: plugins/docker
98+
settings:
99+
username:
100+
from_secret: dockerhub_username
101+
password:
102+
from_secret: dockerhub_password
103+
repo: "kernai/${DRONE_REPO_NAME}"
104+
tag: "${DRONE_TAG}-drone-amd64"
105+
106+
trigger:
107+
event:
108+
- tag
109+
110+
---
111+
kind: pipeline
112+
type: docker
113+
name: arm64-dockerhub
114+
115+
platform:
116+
arch: arm64
117+
118+
steps:
119+
- name: build and publish
120+
image: plugins/docker
121+
settings:
122+
username:
123+
from_secret: dockerhub_username
124+
password:
125+
from_secret: dockerhub_password
126+
repo: "kernai/${DRONE_REPO_NAME}"
127+
tag: "${DRONE_TAG}-drone-arm64"
128+
129+
trigger:
130+
event:
131+
- tag
132+
133+
---
134+
kind: pipeline
135+
name: manifest-version
136+
steps:
137+
- name: manifest
138+
image: plugins/manifest
139+
settings:
140+
spec: drone-manifest-version.tmpl
141+
tag: "${DRONE_TAG}"
142+
ignore_missing: true
143+
username:
144+
from_secret: dockerhub_username
145+
password:
146+
from_secret: dockerhub_password
147+
148+
depends_on:
149+
- amd64-dockerhub
150+
- arm64-dockerhub
151+
152+
trigger:
153+
event:
154+
- tag
155+
156+
---
157+
kind: pipeline
158+
name: manifest-latest
159+
steps:
160+
- name: manifest
161+
image: plugins/manifest
162+
settings:
163+
spec: drone-manifest-latest.tmpl
164+
tag: "${DRONE_TAG}"
165+
ignore_missing: true
166+
username:
167+
from_secret: dockerhub_username
168+
password:
169+
from_secret: dockerhub_password
170+
171+
depends_on:
172+
- manifest-version
173+
174+
trigger:
175+
event:
176+
- tag

Dockerfile

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
FROM python:3.9
2+
3+
RUN apt update && apt install -y curl
4+
5+
COPY . .
6+
7+
RUN pip3 install -r requirements.txt
8+
9+
ENTRYPOINT ["/run.sh"]
10+

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@
186186
same "printed page" as the copyright notice for easier
187187
identification within third-party archives.
188188

189-
Copyright [yyyy] [name of copyright owner]
189+
Copyright 2022 onetask.ai GmbH
190190

191191
Licensed under the Apache License, Version 2.0 (the "License");
192192
you may not use this file except in compliance with the License.

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,5 @@
1-
# refinery-ac-exec-env
1+
# refinery-ac-exec-env [![Build Status](https://drone.dev.onetask.ai/api/badges/code-kern-ai/refinery-ac-exec-env/status.svg?ref=refs/heads/dev)](https://drone.dev.onetask.ai/code-kern-ai/refinery-ac-exec-env)
2+
[![refinery repository](https://uploads-ssl.webflow.com/61e47fafb12bd56b40022a49/62c2f30f935f4d37dc864eeb_Kern%20refinery.png)](https://github.com/code-kern-ai/refinery)
3+
4+
Execution environment for attribute calculation in [refinery](https://github.com/code-kern-ai/refinery). Containerized function as a service to build custom attributes derived from the original data.
5+
If you like what we're working on, please leave a ⭐ for [refinery](https://github.com/code-kern-ai/refinery)!

build

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/bin/bash
2+
IS_ARM64=""
3+
currentArch="$(uname -m)"
4+
if [ "$currentArch" == "arm64" ];
5+
then
6+
echo "architecture: arm64"
7+
IS_ARM64="_arm64"
8+
else
9+
echo "architecture: $currentArch"
10+
fi
11+
12+
docker build -t registry.dev.onetask.ai/code-kern-ai/refinery-ac-exec-env:dev$IS_ARM64 .

drone-manifest-latest.tmpl

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
image: kernai/refinery-ac-exec-env:latest
2+
manifests:
3+
-
4+
image: kernai/refinery-ac-exec-env:{{#if build.tag}}{{build.tag}}-{{/if}}drone-amd64
5+
platform:
6+
architecture: amd64
7+
os: linux
8+
-
9+
image: kernai/refinery-ac-exec-env:{{#if build.tag}}{{build.tag}}-{{/if}}drone-arm64
10+
platform:
11+
architecture: arm64
12+
os: linux
13+
variant: v8

drone-manifest-version.tmpl

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
image: kernai/refinery-ac-exec-env:{{#if build.tag}}{{build.tag}}{{else}}latest{{/if}}
2+
{{#if build.tags}}
3+
tags:
4+
{{#each build.tags}}
5+
- {{this}}
6+
{{/each}}
7+
{{/if}}
8+
manifests:
9+
-
10+
image: kernai/refinery-ac-exec-env:{{#if build.tag}}{{build.tag}}-{{/if}}drone-amd64
11+
platform:
12+
architecture: amd64
13+
os: linux
14+
-
15+
image: kernai/refinery-ac-exec-env:{{#if build.tag}}{{build.tag}}-{{/if}}drone-arm64
16+
platform:
17+
architecture: arm64
18+
os: linux
19+
variant: v8

requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
beautifulsoup4<=4.11.1
2+
requests>=2.28.1
3+
spacy>=3.4.1

run.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/bin/bash
2+
3+
/usr/bin/curl -s "$1" > docbin_full.json;
4+
/usr/bin/curl -s "$2" > attribute_calculators.py;
5+
6+
/usr/local/bin/python run_ac.py "$3" "$4";

run_ac.py

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
import json
2+
import requests
3+
import spacy
4+
import sys
5+
from spacy.tokens import DocBin
6+
7+
8+
def load_data_dict(record):
9+
if record["bytes"][:2] == "\\x":
10+
record["bytes"] = record["bytes"][2:]
11+
else:
12+
raise ValueError("Unknown byte format in DocBin. Please contact the support.")
13+
14+
byte = bytes.fromhex(record["bytes"])
15+
doc_bin_loaded = DocBin().from_bytes(byte)
16+
docs = list(doc_bin_loaded.get_docs(vocab))
17+
data_dict = {}
18+
for (col, doc) in zip(record["columns"], docs):
19+
data_dict[col] = doc
20+
21+
for key in record:
22+
if key in ["record_id", "bytes", "columns"]:
23+
continue
24+
data_dict[key] = record[key]
25+
return data_dict
26+
27+
28+
def parse_data_to_record_dict(record_chunk):
29+
result = []
30+
for r in record_chunk:
31+
result.append({"id": r["record_id"], "data": load_data_dict(r)})
32+
return result
33+
34+
35+
if __name__ == "__main__":
36+
_, iso2_code, payload_url = sys.argv
37+
38+
print("Preparing data for attribute calculation.")
39+
40+
# This import statement will always be highlighted as a potential error, as during devtime,
41+
# the script `labeling_functions` does not exist. It will be inserted at runtime
42+
from attribute_calculators import ac
43+
44+
vocab = spacy.blank(iso2_code).vocab
45+
46+
with open("docbin_full.json", "r") as infile:
47+
docbin_data = json.load(infile)
48+
49+
record_dict_list = parse_data_to_record_dict(docbin_data)
50+
51+
print("Running attribute calculation.")
52+
calculated_attribute_by_record_id = {}
53+
for record_dict in record_dict_list:
54+
calculated_attribute_by_record_id[record_dict["id"]] = ac(record_dict["data"])
55+
56+
print("Finished execution.")
57+
requests.put(payload_url, json=calculated_attribute_by_record_id)

0 commit comments

Comments
 (0)