Skip to content

Commit d117c1c

Browse files
authored
Merge pull request #149 from Wikidata/2025-05-refactor-scripts
Listener as a Django command and refactor editgroups-commons deployment
2 parents 11d7fa3 + 5b92da0 commit d117c1c

File tree

9 files changed

+115
-50
lines changed

9 files changed

+115
-50
lines changed

.github/workflows/toolforge-check-lag.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ jobs:
1212
strategy:
1313
max-parallel: 4
1414
matrix:
15-
tool: ["editgroups"]
15+
tool: ["editgroups", "editgroups-commons"]
1616

1717
steps:
1818
- name: Configure SSH key

.github/workflows/toolforge-deploy.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
strategy:
1212
max-parallel: 4
1313
matrix:
14-
tool: ["editgroups"]
14+
tool: ["editgroups", "editgroups-commons"]
1515

1616
steps:
1717
- name: Configure SSH key
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Run Celery on kubernetes
2+
apiVersion: apps/v1
3+
kind: Deployment
4+
metadata:
5+
name: editgroups-commons.celery.sh
6+
namespace: tool-editgroups-commons
7+
labels:
8+
name: editgroups-commons.celery.sh
9+
toolforge: tool
10+
spec:
11+
replicas: 1
12+
selector:
13+
matchLabels:
14+
name: editgroups-commons.celery.sh
15+
toolforge: tool
16+
template:
17+
metadata:
18+
labels:
19+
name: editgroups-commons.celery.sh
20+
toolforge: tool
21+
spec:
22+
containers:
23+
- name: celery
24+
image: docker-registry.tools.wmflabs.org/toolforge-python311-sssd-base:latest
25+
command: [ "/data/project/editgroups-commons/www/python/src/tasks-commons.sh" ]
26+
workingDir: /data/project/editgroups-commons/www/python/src
27+
env:
28+
- name: HOME
29+
value: /data/project/editgroups-commons
30+
imagePullPolicy: Always
31+
resources:
32+
requests:
33+
memory: "512Mi"
34+
limits:
35+
memory: "1024Mi"

docs/architecture.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,9 @@ This is a fairly reliable endpoint which also lets us resume the stream from a r
3333
ingestion stopped for some reason. By default, the listener tries to resume listening from the
3434
date of the latest edit it has ingested.
3535

36-
This process can be invoked directly as a script::
36+
This process can be invoked directly as a Django management command::
3737

38-
python listener.py
38+
python3 manage.py listener
3939

4040
It can be run as an `attached daemon to uwsgi <https://uwsgi-docs.readthedocs.io/en/latest/AttachingDaemons.html>`_.
4141

docs/install.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ start a local web server, you can then access your EditGroups instance at ``http
3333
By default, there will not be much to see as the database will be empty. To get some data in, you need
3434
to run the listener script, which reads the Wikidata event stream and populates the database::
3535

36-
python listener.py
36+
python3 manage.py listener
3737

3838
You will also need to run Celery, which will periodically annotate edits which need inspection,
3939
as well as providing the undo functionality (if you have set up OAuth, see below)::
@@ -130,7 +130,7 @@ Put the following content in ``~/www/python/uwsgi.ini``::
130130
static-map = /static=/data/project/editgroups/www/python/src/static
131131

132132
master = true
133-
attach-daemon = /data/project/editgroups/www/python/venv/bin/python3 /data/project/editgroups/www/python/src/listener.py
133+
attach-daemon = /data/project/editgroups/www/python/venv/bin/python3 /data/project/editgroups/www/python/src/manage.py listener
134134

135135
and run ``./manage.py collectstatic`` in the ``~/www/python/src`` directory. The listener will be an attached dameon, restarting with webservice restart.
136136

listener.py

Lines changed: 0 additions & 41 deletions
This file was deleted.

restart_celery.sh

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,15 @@
11
#!/bin/bash
22
export GOMAXPROCS=1
3-
kubectl delete deployment editgroups.celery.sh ;
3+
TOOLNAME=$(whoami | cut -d "." -f 2)
4+
echo "Running as ${TOOLNAME}"
5+
6+
if [[ "$TOOLNAME" == "editgroups" ]]; then
7+
CELERY_FN=celery
8+
else
9+
CELERY_FN="celery-${TOOLNAME}"
10+
fi
11+
12+
kubectl delete deployment "${TOOLNAME}.celery.sh";
413
echo "Waiting for Celery to stop";
5-
sleep 45 ;
6-
kubectl create -f /data/project/editgroups/www/python/src/deployment/celery.yaml && kubectl get pods
14+
sleep 45;
15+
kubectl create -f "/data/project/${TOOLNAME}/www/python/src/deployment/${CELERY_FN}.yaml"
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#!/usr/bin/env python
2+
import sys
3+
from datetime import datetime
4+
from datetime import timedelta
5+
6+
from django.core.management.base import BaseCommand
7+
8+
from store.stream import WikiEditStream
9+
from store.utils import grouper
10+
from store.models import Edit
11+
12+
13+
class Command(BaseCommand):
14+
"""
15+
Amount of time to look back when restarting
16+
the listener. This helps make sure that we don't
17+
lose any edit when the listener is restarted.
18+
"""
19+
20+
LOOKBEHIND_OFFSET = timedelta(minutes=5)
21+
help = "Listens to edits with EventStream"
22+
23+
def handle(self, *args, **options):
24+
print("Listening to edits...")
25+
s = WikiEditStream()
26+
try:
27+
latest_edit_seen = Edit.objects.order_by("-timestamp")[0].timestamp
28+
fetch_from = latest_edit_seen - self.LOOKBEHIND_OFFSET
29+
except IndexError:
30+
fetch_from = None
31+
offset = fetch_from.isoformat() if fetch_from else "now"
32+
print("Starting from offset %s" % offset)
33+
34+
for i, batch in enumerate(grouper(s.stream(fetch_from), 50)):
35+
if i % 50 == 0:
36+
print("batch %d" % i)
37+
print(datetime.fromtimestamp(batch[0].get("timestamp")))
38+
sys.stdout.flush()
39+
Edit.ingest_edits(batch)
40+
41+
print("End of stream")

tasks-commons.sh

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/usr/bin/env bash
2+
3+
4+
VENV_DIR=/data/project/editgroups-commons/www/python/venv
5+
6+
if [[ -f ${VENV_DIR}/bin/activate ]]; then
7+
source ${VENV_DIR}/bin/activate
8+
else
9+
echo "Creating virtualenv"
10+
rm -rf ${VENV_DIR}
11+
pyvenv ${VENV_DIR}
12+
source ${VENV_DIR}/bin/activate
13+
echo "Installing requirements"
14+
pip install -r requirements.txt
15+
fi;
16+
echo "Starting celery"
17+
export C_FORCE_ROOT=True
18+
/data/project/editgroups-commons/www/python/venv/bin/python3 /data/project/editgroups-commons/www/python/venv/bin/celery --app=editgroups-commons.celery:app worker -l INFO -B --concurrency=3 --max-memory-per-child=50000
19+
echo $?
20+
echo "Celery done"
21+

0 commit comments

Comments
 (0)