Skip to content

Commit 570ff03

Browse files
authored
Docs and demo housekeeping (#87)
* Make the demo more pretty * Delete 2 docs file and fix demo * Remove more we do not need * Fix usage link * Review comment * Implement latest round of review comments
1 parent b210097 commit 570ff03

File tree

8 files changed

+132
-364
lines changed

8 files changed

+132
-364
lines changed

README.md

Lines changed: 67 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,41 @@
11
<!-- License Badge -->
22
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/ansible/dispatcher/blob/main/LICENSE)
33

4-
This is intended to be a working space for prototyping a code split of:
5-
6-
<https://github.com/ansible/awx/tree/devel/awx/main/dispatch>
4+
## Dispatcher
75

8-
As a part of doing the split, we also want to resolve a number of
9-
long-standing design and sustainability issues, thus, asyncio.
10-
11-
The philosophy of the dispatcher is to have a limited scope
6+
The dispatcher is a service to run python tasks in subprocesses,
7+
designed specifically to work well with pg_notify,
8+
but intended to be extensible to other message delivery means.
9+
Its philosophy is to have a limited scope
1210
as a "local" runner of background tasks, but to be composable
1311
so that it can be "wrapped" easily to enable clustering and
1412
distributed task management by apps using it.
1513

14+
> [!WARNING]
15+
> This project is in initial development. Expect many changes, including name, paths, and CLIs.
16+
1617
Licensed under [Apache Software License 2.0](LICENSE)
1718

1819
### Usage
1920

2021
You have a postgres server configured and a python project.
2122
You will use dispatcher to trigger a background task over pg_notify.
22-
2323
Both your *background dispatcher service* and your *task publisher* process must have
24-
python configured so that your task is importable.
24+
python configured so that your task is importable. Instructions are broken into 3 steps:
25+
26+
1. **Library** - Configure dispatcher, mark the python methods you will run with it
27+
2. **Dispatcher service** - Start your background task service, it will start listening
28+
3. **Publisher** - From some other script, submit tasks to be ran
2529

26-
For more options, see `docs/usage.md`.
30+
In the "Manual Demo" section, an runnable example of this is given.
2731

2832
#### Library
2933

3034
The dispatcher `@task()` decorator is used to register tasks.
3135

32-
See the `tools/test_methods.py` module.
33-
This defines a dispatcher task and the pg_notify channel it will be sent over.
36+
The [tests/data/methods.py](tests/data/methods.py) module defines some
37+
dispatcher tasks and the pg_notify channels they will be sent over.
38+
For more `@task` options, see [docs/task_options.md](docs/task_options.md).
3439

3540
```python
3641
from dispatcher.publish import task
@@ -49,10 +54,12 @@ from dispatcher.config import setup
4954
config = {
5055
"producers": {
5156
"brokers": {
52-
"pg_notify": {"conninfo": "dbname=postgres user=postgres"},
53-
"channels": [
54-
"test_channel",
55-
],
57+
"pg_notify": {
58+
"conninfo": "dbname=postgres user=postgres"
59+
"channels": [
60+
"test_channel",
61+
],
62+
},
5663
},
5764
},
5865
"pool": {"max_workers": 4},
@@ -62,6 +69,8 @@ setup(config)
6269

6370
For more on how to set up and the allowed options in the config,
6471
see the section [config](docs/config.md) docs.
72+
The `queue` passed to `@task` needs to match a pg_notify channel in the `config`.
73+
It is often useful to have different workers listen to different sets of channels.
6574

6675
#### Dispatcher service
6776

@@ -83,8 +92,8 @@ run_service()
8392
```
8493

8594
Configuration tells how to connect to postgres, and what channel(s) to listen to.
86-
The demo has this in `dispatcher.yml`, which includes listening to `test_channel`.
87-
That matches the `@task` in the library.
95+
96+
8897

8998
#### Publisher
9099

@@ -116,38 +125,64 @@ print_hello.apply_async(args=[], kwargs={})
116125
The difference is that `apply_async` takes both args and kwargs as kwargs themselves,
117126
and allows for additional configuration parameters to come after those.
118127

119-
As of writing, this only works if you have a Django connection configured.
120-
You can manually pass configuration info (as in the demo) for non-Django use.
121-
122128
### Manual Demo
123129

130+
For this demo, the [tests/data/methods.py](tests/data/methods.py) will be used
131+
in place of a real app. Making those importable is why `PYTHONPATH` must be
132+
modified in some steps. The config for this demo can be found in the
133+
[dispatcher.yml](dispatcher.yml) file, which is a default location
134+
the `dispatcher-standalone` entrypoint looks for.
135+
136+
Initial setup:
137+
138+
```
139+
pip install -e .[pg_notify]
140+
make postgres
141+
```
142+
124143
You need to have 2 terminal tabs open to run this.
125144

126145
```
127146
# tab 1
128-
make postgres
129-
PYTHONPATH=$PYTHONPATH:tools/ dispatcher-standalone
147+
PYTHONPATH=$PYTHONPATH:. dispatcher-standalone
130148
# tab 2
131-
python tools/write_messages.py
149+
./run_demo.py
132150
```
133151

134-
This will run the dispatcher with schedules, and process a burst of messages
135-
that give instructions to run tasks.
152+
This will run the dispatcher with schedules, and process bursts of messages
153+
that give instructions to run tasks. Tab 2 will contain some responses
154+
from the dispatcher service. Tab 1 will show a large volume of logs
155+
related to processing tasks.
136156

137157
### Running Tests
138158

139-
A structure has been set up for integration tests.
140-
The word "integration" only means that postgres must be running.
159+
Most tests (except for tests/unit/) require postgres to be running.
141160

142161
```
143162
pip install -r requirements_dev.txt
144163
make postgres
145-
py.test tests/
164+
pytest tests/
146165
```
147166

148-
This accomplishes the most basic of starting and shutting down.
149-
With no tasks submitted, it should record running 0 tasks,
150-
and with a task submitted, it records running 1 task.
167+
### Background
168+
169+
This is intended to be a working space for prototyping a code split of:
170+
171+
<https://github.com/ansible/awx/tree/devel/awx/main/dispatch>
172+
173+
As a part of doing the split, we also want to resolve a number of
174+
long-standing design and sustainability issues, thus, asyncio.
175+
For a little more background see [docs/design_notes.md](docs/design_notes.md).
176+
177+
There is documentation of the message formats used by the dispatcher
178+
in [docs/message_formats.md](docs/message_formats.md). Some of these are internal,
179+
but some messages are what goes over the user-defined brokers (pg_notify).
180+
You can trigger tasks using your own "publisher" code as an alternative
181+
to attached methods like `.apply_async`. Doing this requires connecting
182+
to postges and submitting a pg_notify message with JSON data
183+
that conforms to the expected format.
184+
The `./run_demo.py` script shows examples of this, but borrows some
185+
connection and config utilities to help.
151186

152187
## Contributing
153188

docs/design_notes.md

Lines changed: 36 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,30 @@
1-
## Reference Designs
1+
## Design Notes
2+
3+
Many of the specific choices made in the dispatcher design are to enable pg_notify use.
4+
The advantage of using pg_notify is that you get an extremely simple topology.
5+
Imagine a web and a task runner service, invoked separately, using postgres.
6+
7+
```mermaid
8+
flowchart TD
9+
10+
A(web)
11+
B(task)
12+
C(postgres)
13+
14+
A-->C
15+
B-->C
16+
```
17+
18+
However, pg_notify is not a true queue, and this drives the design of the dispatcher,
19+
having its main process listening for messages and farming the work out to worker subprocesses.
20+
21+
This helps with pg_notify use, but still doesn't solve all the problems,
22+
because those problems ultimately need persistent storage.
23+
Current plan is to offer a "stock" solution in the form of a django-ansible-base app.
24+
25+
https://github.com/ansible/django-ansible-base
26+
27+
End-goal features and design proposals have been moved to the issue queue.
228

329
### AWX dispatcher
430

@@ -12,14 +38,15 @@ https://github.com/ansible/awx/pull/2266
1238

1339
> ...much like the callback receiver implementation in 3.3.0 (on which this code is based), this entry point is a kombu.ConsumerMixin.
1440
15-
### Kombu
41+
### Kombu (Celery)
1642

17-
Kombu is a sub-package of celery.
43+
Kombu was used by AWX before its transition to a custom solution. Kombu is a sub-package of celery.
1844

1945
https://github.com/celery/kombu
2046

2147
In messaging module, this has a `Producer` and `Consumer` classes.
22-
In mixins it has a `ConsumerMixin`, but no methods seem to have made it into AWX dispatch.
48+
In mixins it has a `ConsumerMixin`. AWX dispatcher has consumer classes,
49+
but no methods seem to have made it from kombu into AWX dispatch.
2350

2451
This doesn't deal with worker pool management. It does have examples with `Worker` classes.
2552
These follow a similar contract with `process_task` here.
@@ -55,60 +82,12 @@ In AWX dispatcher, a full queue may put messages into individual worker IPCs.
5582
This caused bad results, like delaying tasks due to long-running jobs,
5683
while the pool had many other workers free up in the mean time.
5784

58-
## Alternative Archectures
59-
60-
This are blue-sky ideas, which may not happen anytime soon,
61-
but they are described to help structure the app today so it can expand
62-
into these potential future roles.
63-
64-
### Singleton task queue
65-
66-
A major pivot from the AWX dispatcher is that we do not use 1 result queue per worker,
67-
but a single result queue for all workers, and each meassage includes a worker id.
68-
69-
If you continue this pattern, then we would no longer have a call queue for each worker,
70-
and workers would just grab messages from the queue as they are available.
71-
72-
The problem you encounter is that you will not know what worker started what task.
73-
If you do any "management" this is a problem. For instance, if you want a task
74-
to have a timeout, you need to know which worker to kill if it goes over its limit.
75-
76-
There is a way to still consolidate the call queue while no losing these other features.
77-
When a worker receives a task, it can submit an ACK to the finished queue telling
78-
the main process that it has started a task, and which task it started.
79-
80-
This isn't ultimately robust, if there is an error between getting the message and ACK,
81-
but this probably isn't a reasonable concern. As of now, this looks viable.
82-
83-
### Persistent work manager
84-
85-
Years ago, when AWX was having trouble with output processing bottlenecks,
86-
we stopped using the main dispatcher process to dispatch job events to workers.
87-
88-
Essentially, any performance-sensitive data volumes should not go through the
89-
pool worker management system where data is passed through IPC queues.
90-
Doing this causes the main process to be a bottleneck.
91-
92-
The solution was to have workers connect to a socket on their own.
93-
94-
Nothing is wrong with this, it's just weird.
95-
None of the written facilities for pool management in dispatcher code is useful.
96-
Because of that, event processing diverged far from the rest of the dispatcher.
97-
98-
Long-term vision here is that:
99-
- a `@task` decorator may mark a task as persistent
100-
- additional messages types will need to be send into the finished queue for
101-
- analytics tracking, like how many messages were processed
102-
- whether a particular resource being monitored has been closed
85+
### Notable python tasking systems
10386

104-
The idea is that this would integrate what was prototyped in:
87+
https://taskiq-python.github.io/guide/architecture-overview.html#context
10588

106-
https://github.com/AlanCoding/receptor-reporter/tree/devel
89+
https://python-rq.org/docs/workers/
10790

108-
That idea involved the main process more than the existing callback receiver.
109-
Because each job has its own socket that has to be read from, so these will come and go.
110-
And a worker may manage more than 1 job at the same time, asynchronously.
91+
https://dramatiq.io/
11192

112-
This also requires forking from what is now `dispatcher.main`.
113-
We could keep the pool (and add more feature) but this requires
114-
an entirely different main loop.
93+
https://docs.celeryq.dev/

0 commit comments

Comments
 (0)