Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/toolforge-check-lag.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
strategy:
max-parallel: 4
matrix:
tool: ["editgroups"]
tool: ["editgroups", "editgroups-commons"]

steps:
- name: Configure SSH key
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/toolforge-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
strategy:
max-parallel: 4
matrix:
tool: ["editgroups"]
tool: ["editgroups", "editgroups-commons"]

steps:
- name: Configure SSH key
Expand Down
35 changes: 35 additions & 0 deletions deployment/celery-editgroups-commons.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Run Celery on kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: editgroups-commons.celery.sh
namespace: tool-editgroups-commons
labels:
name: editgroups-commons.celery.sh
toolforge: tool
spec:
replicas: 1
selector:
matchLabels:
name: editgroups-commons.celery.sh
toolforge: tool
template:
metadata:
labels:
name: editgroups-commons.celery.sh
toolforge: tool
spec:
containers:
- name: celery
image: docker-registry.tools.wmflabs.org/toolforge-python311-sssd-base:latest
command: [ "/data/project/editgroups-commons/www/python/src/tasks-commons.sh" ]
workingDir: /data/project/editgroups-commons/www/python/src
env:
- name: HOME
value: /data/project/editgroups-commons
imagePullPolicy: Always
resources:
requests:
memory: "512Mi"
limits:
memory: "1024Mi"
4 changes: 2 additions & 2 deletions docs/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ This is a fairly reliable endpoint which also lets us resume the stream from a r
ingestion stopped for some reason. By default, the listener tries to resume listening from the
date of the latest edit it has ingested.

This process can be invoked directly as a script::
This process can be invoked directly as a Django management command::

python listener.py
python3 manage.py listener

It can be run as an `attached daemon to uwsgi <https://uwsgi-docs.readthedocs.io/en/latest/AttachingDaemons.html>`_.

Expand Down
4 changes: 2 additions & 2 deletions docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ start a local web server, you can then access your EditGroups instance at ``http
By default, there will not be much to see as the database will be empty. To get some data in, you need
to run the listener script, which reads the Wikidata event stream and populates the database::

python listener.py
python3 manage.py listener

You will also need to run Celery, which will periodically annotate edits which need inspection,
as well as providing the undo functionality (if you have set up OAuth, see below)::
Expand Down Expand Up @@ -130,7 +130,7 @@ Put the following content in ``~/www/python/uwsgi.ini``::
static-map = /static=/data/project/editgroups/www/python/src/static

master = true
attach-daemon = /data/project/editgroups/www/python/venv/bin/python3 /data/project/editgroups/www/python/src/listener.py
attach-daemon = /data/project/editgroups/www/python/venv/bin/python3 /data/project/editgroups/www/python/src/manage.py listener

and run ``./manage.py collectstatic`` in the ``~/www/python/src`` directory. The listener will be an attached dameon, restarting with webservice restart.

Expand Down
41 changes: 0 additions & 41 deletions listener.py

This file was deleted.

15 changes: 12 additions & 3 deletions restart_celery.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
#!/bin/bash
export GOMAXPROCS=1
kubectl delete deployment editgroups.celery.sh ;
TOOLNAME=$(whoami | cut -d "." -f 2)
echo "Running as ${TOOLNAME}"

if [[ "$TOOLNAME" == "editgroups" ]]; then
CELERY_FN=celery
else
CELERY_FN="celery-${TOOLNAME}"
fi

kubectl delete deployment "${TOOLNAME}.celery.sh";
echo "Waiting for Celery to stop";
sleep 45 ;
kubectl create -f /data/project/editgroups/www/python/src/deployment/celery.yaml && kubectl get pods
sleep 45;
kubectl create -f "/data/project/${TOOLNAME}/www/python/src/deployment/${CELERY_FN}.yaml"
41 changes: 41 additions & 0 deletions store/management/commands/listener.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/usr/bin/env python
import sys
from datetime import datetime
from datetime import timedelta

from django.core.management.base import BaseCommand

from store.stream import WikiEditStream
from store.utils import grouper
from store.models import Edit


class Command(BaseCommand):
"""
Amount of time to look back when restarting
the listener. This helps make sure that we don't
lose any edit when the listener is restarted.
"""

LOOKBEHIND_OFFSET = timedelta(minutes=5)
help = "Listens to edits with EventStream"

def handle(self, *args, **options):
print("Listening to edits...")
s = WikiEditStream()
try:
latest_edit_seen = Edit.objects.order_by("-timestamp")[0].timestamp
fetch_from = latest_edit_seen - self.LOOKBEHIND_OFFSET
except IndexError:
fetch_from = None
offset = fetch_from.isoformat() if fetch_from else "now"
print("Starting from offset %s" % offset)

for i, batch in enumerate(grouper(s.stream(fetch_from), 50)):
if i % 50 == 0:
print("batch %d" % i)
print(datetime.fromtimestamp(batch[0].get("timestamp")))
sys.stdout.flush()
Edit.ingest_edits(batch)

print("End of stream")
21 changes: 21 additions & 0 deletions tasks-commons.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/usr/bin/env bash


VENV_DIR=/data/project/editgroups-commons/www/python/venv

if [[ -f ${VENV_DIR}/bin/activate ]]; then
source ${VENV_DIR}/bin/activate
else
echo "Creating virtualenv"
rm -rf ${VENV_DIR}
pyvenv ${VENV_DIR}
source ${VENV_DIR}/bin/activate
echo "Installing requirements"
pip install -r requirements.txt
fi;
echo "Starting celery"
export C_FORCE_ROOT=True
/data/project/editgroups-commons/www/python/venv/bin/python3 /data/project/editgroups-commons/www/python/venv/bin/celery --app=editgroups-commons.celery:app worker -l INFO -B --concurrency=3 --max-memory-per-child=50000
echo $?
echo "Celery done"