Skip to content

Commit 7034f2c

Browse files
author
James (ODSC)
authored
Merge pull request #12 from OpenDataServices/2024-10-04
docs content
2 parents bf90717 + f083ca2 commit 7034f2c

File tree

16 files changed

+336
-21
lines changed

16 files changed

+336
-21
lines changed

docs/conf.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,27 @@
1+
import os
2+
import pathlib
3+
import sys
4+
5+
import django
6+
7+
import libcoveweb2.settings
8+
19
project = "LibCoveWeb2"
210

311
master_doc = "index"
412

513
html_theme = "odsc_default_sphinx_theme"
14+
15+
extensions = [
16+
"sphinx.ext.autodoc",
17+
]
18+
19+
###### Make Sphinx able to document our python code
20+
21+
# We need to be able to see our code, so add this directory to path
22+
sys.path.insert(0, str(pathlib.Path("..").resolve()))
23+
24+
# We need to set up Django enough that it won't complain
25+
libcoveweb2.settings.INSTALLED_APPS += ("libcoveweb2",)
26+
os.environ["DJANGO_SETTINGS_MODULE"] = "libcoveweb2.settings"
27+
django.setup()

docs/django-settings.rst

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
Django Settings
2+
===============
3+
4+
To use this app you'll need to define several settings
5+
6+
7+
8+
Process Tasks
9+
-------------
10+
11+
You need to define a `PROCESS_TASKS` setting. This lists all the tasks that will be processed for each uploaded data, in order of processing.
12+
13+
It should be a list of tuples and every tuple should be `('Python module', 'Python class name')`.
14+
Each class should extend libcoveweb2.process.base.ProcessDataTask
15+
16+
Example:
17+
18+
.. code-block:: python
19+
20+
PROCESS_TASKS = [
21+
# Get data if not already on disk
22+
("libcoveweb2.process.common_tasks.download_data_task", "DownloadDataTask"),
23+
...
24+
]
25+
26+
Celery Message Queue
27+
--------------------
28+
29+
Any Celery settings needed must be set up.
30+
31+
At a minimum this will include `CELERY_BROKER_URL`.
32+
33+
34+
Settings to copy from library which have sensible defaults
35+
----------------------------------------------------------
36+
37+
This application also needs a bunch of configuration values that already have defaults set. In most cases you can just reuse these variables.
38+
39+
:doc:`For a list of these settings see here. <python-api/settings>`
40+
41+
To do so, you can do something like this in your Django project's main setting.py file:
42+
43+
.. code-block:: python
44+
45+
from libcoveweb2 import settings
46+
ALLOWED_JSON_CONTENT_TYPES = settings.ALLOWED_JSON_CONTENT_TYPES
47+
ALLOWED_JSON_EXTENSIONS = settings.ALLOWED_JSON_EXTENSIONS
48+
...

docs/hosting/requirements.rst

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
Hosting Requirements
2+
====================
3+
4+
Python server for the Django app
5+
----------------------------------
6+
7+
Normal options
8+
9+
Database
10+
----------
11+
12+
This is tested with PostgreSQL.
13+
14+
Message queue compatible with Celery
15+
--------------------------------------
16+
17+
Normal options
18+
19+
File Storage
20+
------------
21+
22+
TODO
23+
24+
Cron tasks
25+
----------
26+
27+
Some Django management commands should be run on a cron task.
28+
29+
* `expire_files` should be run daily
30+

docs/index.rst

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,31 @@
11
LibCoveWeb2
22
===========
33

4+
5+
LibCoveWeb2 is a Django application to use as a library in your own Django apps.
6+
7+
It let's you create CoVE apps - CoVE exists to help people:
8+
9+
* Convert data between common formats (e.g. csv to json)
10+
* Validate data against rules
11+
* Explore data, that machines find easy, but humans find harder to read
12+
13+
The application consists of:
14+
15+
* Database Models to save details of user submitted data
16+
* File storage space to save the user submitted data and cache results of processing
17+
* A message queue
18+
* Workers to process the data according to tasks you provide (but there is a library of common tasks in this application)
19+
* A view to show users output information from the cache of results
20+
21+
22+
.. toctree::
23+
:maxdepth: 2
24+
25+
processing-pipeline.rst
26+
django-settings.rst
27+
python-api/index.rst
28+
migration-from-lib-cove-web.rst
29+
hosting/requirements.rst
30+
used-by.rst
31+
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Migration from lib-cove-web
2+
===========================
3+
4+
5+
This library is an updated version of the previous library ( https://github.com/OpenDataServices/lib-cove-web ).
6+
7+
However, the structure of the library and how it works have massive changes.
8+
For this reason the upgrade path isn't a simple upgrade, but requires rewriting work in the software that uses this.
9+
10+
This is why this library is a separate git repository and a separate pypi package,
11+
so that development of the previous library can also continue and users are not forced to upgrade before they are ready.
12+
13+

docs/processing-pipeline.rst

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
Processing Pipeline
2+
===================
3+
4+
Why?
5+
----
6+
7+
The application lets you define a series of tasks that will be checked for each bit of uploaded data, in order.
8+
9+
Tasks need to be defined by each app, but there is a library of common tasks to make this easier.
10+
11+
This allows for maximum flexibility - each app can define the tasks they need, including non-standard tasks that are not used by other CoVE's.
12+
(For example, BODS CoVE has a sample mode. When the user uploads big data, they can choose to run sample mode and only check some of it.
13+
This is accomplished by a special task towards the start of the pipeline that generates a smaller file from the uploaded file.)
14+
15+
What happens when the user uploads data?
16+
----------------------------------------
17+
18+
The background worker will start processing the data and the user will be redirected to the results page.
19+
20+
What happens when the user looks at a results page?
21+
---------------------------------------------------
22+
23+
Everytime a user views a results page, the system will check the state of that data.
24+
25+
If it's currently being processed, the user will see a progress page with a wait message.
26+
27+
If it's not currently being processed, the system will call `is_processing_applicable` and `is_processing_needed` functions on each task to see if any work is needed.
28+
29+
If there is work to do, it will start the work and the user will see a progress page with a wait message.
30+
This means that even after a task first finishes, a task can change it's mind and request to do more work.
31+
(The most common use case for this is if the software is upgraded and how the processing is done is changed.)
32+
33+
If there is no work to do, the system will show a results page to the user.
34+
`get_context` will be called on every task, so the task can load results from it's cache and present them to the user.
35+
36+
Other pages that may be shown to the user include:
37+
* An error page if a Python error occurred
38+
* An expired page, if the data is so old that it has been expired and removed from the system
39+
40+
How is the data actually processed?
41+
-----------------------------------
42+
43+
To process the task, the background worker will call `process`.
44+
This can take as long as it needs, and the results should be cached for speedy loading later.
45+
46+
Early tasks can also return data that will be passed to later tasks.
47+
This means any information or work that is needed in multiple tasks does not need to be done multiple times, but can be done once then reused.
48+
49+
50+
How should I define my tasks?
51+
-----------------------------
52+
53+
54+
Each task should be defined by extending a class. :doc:`For more information on the base class, see here. <python-api/process/base>`
55+
56+
And your tasks should then be defined in settings. :doc:`For more information on settings, see here. <django-settings>`
57+
58+
An example task pipeline
59+
------------------------
60+
61+
.. code-block:: python
62+
63+
64+
PROCESS_TASKS = [
65+
# Get data if not already on disk - if the user provided a URL
66+
("libcoveweb2.process.common_tasks.download_data_task", "DownloadDataTask"),
67+
# BOD's has a special Sample mode.
68+
# If that's activated, we'll make the sample data now for later tasks to use.
69+
("cove_bods.process", "Sample"),
70+
# Make sure uploads are in primary format - for BOD's that is JSON
71+
# So any spreadsheets uploaded should be converted
72+
("cove_bods.process", "WasJSONUploaded"),
73+
("cove_bods.process", "ConvertSpreadsheetIntoJSON"),
74+
# Some information is reused in multiple tasks to come
75+
# So we'll process it once now and later tasks can reuse it.
76+
("cove_bods.process", "GetDataReaderAndConfigAndSchema"),
77+
# Convert from primary JSON format into other output formats
78+
("cove_bods.process", "ConvertJSONIntoSpreadsheets"),
79+
# Check and generate statistics from the JSON data
80+
("cove_bods.process", "AdditionalFieldsChecksTask"),
81+
("cove_bods.process", "PythonValidateTask"),
82+
("cove_bods.process", "JsonSchemaValidateTask"),
83+
]
84+

docs/python-api/index.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
Python API
2+
==========
3+
4+
This section documents the Python API that software using this library should consider.
5+
6+
It does not document Python that is not intended for reuse by others (you can read the source code for that.)
7+
8+
9+
.. toctree::
10+
:maxdepth: 2
11+
12+
settings.rst
13+
process/base.rst
14+
process/common_tasks/download_data_task.rst
15+
process/common_tasks/task_with_state.rst

docs/python-api/process/base.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Process Task Base
2+
=================
3+
4+
:doc:`For more information on how to use this class, see here. <../../processing-pipeline>`
5+
6+
You probably want to process some data, get a result and cache it.
7+
There is another class `TaskWithState` that helps you do this, and so may be more useful for you.
8+
:doc:`For more information on this helper class, see here. <common_tasks/task_with_state>`
9+
10+
.. autoclass:: libcoveweb2.process.base.ProcessDataTask
11+
:members:
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Common Process Task: Download Data Task
2+
=======================================
3+
4+
5+
.. autoclass:: libcoveweb2.process.common_tasks.download_data_task.DownloadDataTask
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Common Process Task: Task With State
2+
====================================
3+
4+
5+
.. autoclass:: libcoveweb2.process.common_tasks.task_with_state.TaskWithState
6+
:members: state_filename, process_get_state
7+

0 commit comments

Comments
 (0)