Skip to content

Commit bbf89f7

Browse files
docs: fix README typos (#304)
- Fix typos and example commands in README - While we're at it, add emojis to sections of README, because... why not
1 parent 2eac3f3 commit bbf89f7

File tree

2 files changed

+35
-34
lines changed

2 files changed

+35
-34
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
- Add more detailed statuses [#298](https://github.com/datagouv/hydra/pull/298)
1111
- Handle cases of too long columns labels for postgres [#299](https://github.com/datagouv/hydra/pull/299)
1212
- Fix rare issue in `/status/crawler/` endpoint [#301](https://github.com/datagouv/hydra/pull/301) [#302](https://github.com/datagouv/hydra/pull/302)
13+
- Fix typos, deprecated examples and add emojis in README [#304](https://github.com/datagouv/hydra/pull/304)
1314

1415
## 2.3.0 (2025-07-15)
1516

README.md

Lines changed: 34 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
![udata-hydra](banner.png)
22

3-
# udata-hydra 🦀
3+
# udata-hydra
44

55
[![CircleCI](https://circleci.com/gh/datagouv/hydra.svg?style=svg)](https://circleci.com/gh/datagouv/hydra)
66
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
@@ -15,7 +15,7 @@ Since it's called _hydra_, it also has mythical powers embedded:
1515
- if the remote resource is a geojson, convert it to PMTiles to offer another distribution of the data
1616
- send crawl and analysis info to a udata instance
1717

18-
## Architecture schema
18+
## 🏗️ Architecture schema
1919

2020
The architecture for the full workflow is the following:
2121

@@ -26,15 +26,15 @@ The hydra crawler is one of the components of the architecture. It will check if
2626

2727
![Crawler architecture](docs/hydra.drawio.png)
2828

29-
## Dependencies
29+
## 📦 Dependencies
3030

3131
This project uses `libmagic`, which needs to be installed on your system, eg:
3232

3333
`brew install libmagic` on MacOS, or `sudo apt-get install libmagic-dev` on linux.
3434

3535
This project uses Python >=3.11 and [Poetry](https://python-poetry.org) >= 2.0.0 to manage dependencies.
3636

37-
## CLI
37+
## 🖥️ CLI
3838

3939
### Create database structure
4040

@@ -47,7 +47,7 @@ Install udata-hydra dependencies and cli.
4747

4848
`poetry run udata-hydra load-catalog`
4949

50-
## Crawler
50+
## 🕷️ Crawler
5151

5252
`poetry run udata-hydra-crawl`
5353

@@ -57,11 +57,11 @@ It will crawl (forever) the catalog according to the config set in `config.toml`
5757

5858
The crawler will start with URLs never checked and then proceed with URLs crawled before `CHECK_DELAYS` interval. It will then wait until something changes (catalog or time).
5959

60-
There's a by-domain backoff mecanism. The crawler will wait when, for a given domain in a given batch, `BACKOFF_NB_REQ` is exceeded in a period of `BACKOFF_PERIOD` seconds. It will retry until the backoff is lifted.
60+
There's a by-domain backoff mechanism. The crawler will wait when, for a given domain in a given batch, `BACKOFF_NB_REQ` is exceeded in a period of `BACKOFF_PERIOD` seconds. It will retry until the backoff is lifted.
6161

6262
If an URL matches one of the `EXCLUDED_PATTERNS`, it will never be checked.
6363

64-
## Worker
64+
## ⚙️ Worker
6565

6666
A job queuing system is used to process long-running tasks. Launch the worker with the following command:
6767

@@ -75,31 +75,31 @@ To empty all the queues:
7575

7676
`poetry run rq empty -c udata_hydra.worker low default high`
7777

78-
## CSV conversion to database
78+
## 📊 CSV conversion to database
7979

80-
Converted CSV tables will be stored in the database specified via `config.DATABASE_URL_CSV`. For tests it's same database as for the catalog. Locally, `docker compose` will launch two distinct database containers.
80+
Converted CSV tables will be stored in the database specified via `config.DATABASE_URL_CSV`. For tests it's the same database as for the catalog. Locally, `docker compose` will launch two distinct database containers.
8181

82-
## Tests
82+
## 🧪 Tests
8383

8484
To run the tests, you need to launch the database, the test database, and the Redis broker with `docker compose -f docker-compose.yml -f docker-compose.test.yml -f docker-compose.broker.yml up -d`.
8585

86-
Make sure the dev dependecies are installed with `poetry install --extras dev`.
86+
Make sure the dev dependencies are installed with `poetry install --extras dev`.
8787

8888
Then you can run the tests with `poetry run pytest`.
8989

90-
To run a specific test file, you can pass the path to the file to pytest, like this: `poetry run pytest tests/test_app.py`.
90+
To run a specific test file, you can pass the path to the file to pytest, like this: `poetry run pytest tests/test_file.py`.
9191

92-
To run a specific test function, you can pass the path to the file and the name of the function to pytest, like this: `poetry run pytest tests/test_app.py::test_get_latest_check`.
92+
To run a specific test function, you can pass the path to the file and the name of the function to pytest, like this: `poetry run pytest tests/test_api/test_api_checks.py::test_get_latest_check`.
9393

9494
If you would like to see print statements as they are executed, you can pass the -s flag to pytest (`poetry run pytest -s`). However, note that this can sometimes be difficult to parse.
9595

96-
### Tests coverage
96+
### 📈 Tests coverage
9797

9898
Pytest automatically uses the `coverage` package to generate a coverage report, which is displayed at the end of the test run in the terminal.
99-
The coverage is configured in the `pypoject.toml` file, in the `[tool.pytest.ini_options]` section.
99+
The coverage is configured in the `pyproject.toml` file, in the `[tool.pytest.ini_options]` section.
100100
You can also override the coverage report configuration when running the tests by passing some flags like `--cov-report` to pytest. See [the pytest-cov documentation](https://pytest-cov.readthedocs.io/en/latest/config.html) for more information.
101101

102-
## API
102+
## 🔌 API
103103

104104
The API will need a Bearer token for each request on protected endpoints (any endpoint that isn't a `GET`).
105105
The token is configured in the `config.toml` file as `API_KEY`, and has a default value set in the `udata_hydra/config_default.toml` file.
@@ -108,15 +108,15 @@ If you're using hydra as an external service to receive resource events from [ud
108108
API key in its `udata.cfg` file:
109109

110110
```python
111-
# Wether udata should publish the resource events
111+
# Whether udata should publish the resource events
112112
PUBLISH_ON_RESOURCE_EVENTS = True
113113
# Where to publish the events
114114
RESOURCES_ANALYSER_URI = "http://localhost:8000"
115115
# The API key that hydra needs
116116
RESOURCES_ANALYSER_API_KEY = "api_key_to_change"
117117
```
118118

119-
### Run
119+
### 🚀 Run
120120

121121
```bash
122122
poetry install
@@ -125,14 +125,14 @@ poetry run adev runserver udata_hydra/app.py
125125
By default, the app will listen on `localhost:8000`.
126126
You can check the status of the app with `curl http://localhost:8000/api/health`.
127127

128-
### Routes/endpoints
128+
### 🛣️ Routes/endpoints
129129

130130
The API serves the following endpoints:
131131

132132
*Related to checks:*
133133
- `GET` on `/api/checks/latest?url={url}&resource_id={resource_id}` to get the latest check for a given URL and/or `resource_id`
134134
- `GET` on `/api/checks/all?url={url}&resource_id={resource_id}` to get all checks for a given URL and/or `resource_id`
135-
- `GET` on `/api/checks/aggregate?group_by={column}&created_at={date}` to get checks occurences grouped by a `column` for a specific `date`
135+
- `GET` on `/api/checks/aggregate?group_by={column}&created_at={date}` to get checks occurrences grouped by a `column` for a specific `date`
136136

137137
*Related to resources:*
138138
- `GET` on `/api/resources/{resource_id}` to get a resource in the DB "catalog" table from its `resource_id`
@@ -146,7 +146,7 @@ The API serves the following endpoints:
146146
> - `POST` on `/api/resource/deleted` -> use `DELETE` on `/api/resources/` instead
147147
148148
*Related to resources exceptions:*
149-
- `GET` on `/api/resources-exceptions` to get the list all resources exceptions
149+
- `GET` on `/api/resources-exceptions` to get the list of all resources exceptions
150150
- `POST` on `/api/resources-exceptions` to create a new resource exception in the DB
151151
- `PUT` on `/api/resources-exceptions/{resource_id}` to update a resource exception in the DB
152152
- `DELETE` on `/api/resources-exceptions/{resource_id}` to delete a resource exception from the DB
@@ -157,8 +157,8 @@ The API serves the following endpoints:
157157
- `GET` on `/api/stats` to get the crawling stats
158158
- `GET` on `/api/health` to get the API version number and environment
159159

160-
You may want to you a helper such as [Bruno](https://www.usebruno.com/) to handle API calls, in which case all the endpoints are ready to use [here](https://github.com/datagouv/api-calls).
161-
More details about some enpoints are provided below with examples, but not for all of them:
160+
You may want to use a helper such as [Bruno](https://www.usebruno.com/) to handle API calls, in which case all the endpoints are ready to use [here](https://github.com/datagouv/api-calls).
161+
More details about some endpoints are provided below with examples, but not for all of them:
162162

163163
#### Get latest check
164164

@@ -237,7 +237,7 @@ $ curl -s "http://localhost:8000/api/checks/all?url=http://www.drees.sante.gouv.
237237
]
238238
```
239239

240-
#### Get checks occurences grouped by a column for a specific date
240+
#### Get checks occurrences grouped by a column for a specific date
241241

242242
Works with `?group_by={column}` and `?created_at={date}`.
243243
`date` should be a date in format `YYYY-MM-DD` or the default keyword `today`.
@@ -411,7 +411,7 @@ $ curl -s "http://localhost:8000/api/stats" | json_pp
411411
}
412412
```
413413
414-
## Using Webhook integration
414+
## 🔗 Using Webhook integration
415415
416416
** Set the config values**
417417
@@ -425,7 +425,7 @@ SENTRY_DSN = "https://{my-sentry-dsn}"
425425
426426
The webhook integration sends HTTP messages to `udata` when resources are analysed or checked to fill resources extras.
427427
428-
Regarding analysis, there is a phase called "change detection". It will try to guess if a resource has been modified based on different criterions:
428+
Regarding analysis, there is a phase called "change detection". It will try to guess if a resource has been modified based on different criteria:
429429
- harvest modified date in catalog
430430
- content-length and last-modified headers
431431
- checksum comparison over time
@@ -442,9 +442,9 @@ The payload should look something like:
442442
}
443443
```
444444
445-
## Development
445+
## 🛠️ Development
446446
447-
### docker compose
447+
### 🐳 docker compose
448448
449449
Multiple docker-compose files are provided:
450450
- a minimal `docker-compose.yml` with two PostgreSQL containers (one for catalog and metadata, the other for converted CSV to database)
@@ -453,17 +453,17 @@ Multiple docker-compose files are provided:
453453
454454
NB: you can launch compose from multiple files like this: `docker compose -f docker-compose.yml -f docker-compose.test.yml up`
455455
456-
### Logging & Debugging
456+
### 📝 Logging & Debugging
457457
458458
The log level can be adjusted using the environment variable LOG_LEVEL.
459459
For example, to set the log level to `DEBUG` when initializing the database, use `LOG_LEVEL="DEBUG" udata-hydra init_db `.
460460
461-
### Writing a migration
461+
### 📋 Writing a migration
462462
463463
1. Add a file named `migrations/{YYYYMMDD}_{description}.sql` and write the SQL you need to perform migration.
464-
2. `udata-hydra migrate` will migrate the database as needeed.
464+
2. `udata-hydra migrate` will migrate the database as needed.
465465
466-
## Deployment
466+
## 🚀 Deployment
467467
468468
3 services need to be deployed for the full stack to run:
469469
- worker
@@ -474,7 +474,7 @@ Refer to each section to learn how to launch them. The only differences from dev
474474
- use `HYDRA_SETTINGS` env var to point to your custom `config.toml`
475475
- use `HYDRA_APP_SOCKET_PATH` to configure where aiohttp should listen to a [reverse proxy connection (eg nginx)](https://docs.aiohttp.org/en/stable/deployment.html#nginx-configuration) and use `udata-hydra-app` to launch the app server
476476
477-
## Contributing
477+
## 🤝 Contributing
478478
479479
Before contributing to the repository and making any PR, it is necessary to initialize the pre-commit hooks:
480480
```bash
@@ -487,6 +487,6 @@ If you cannot use pre-commit, it is necessary to format, lint, and sort imports
487487
poetry run ruff check --fix . && poetry run ruff format .
488488
```
489489
490-
### Releases
490+
### 🏷️ Releases
491491
492492
The release process uses [bump'X](https://github.com/datagouv/bumpx).

0 commit comments

Comments
 (0)