Skip to content

Commit 0a0c4cf

Browse files
authored
Merge pull request #427 from aiondemand/docs/connectors
Docs/connectors
2 parents efc21d7 + e83c39c commit 0a0c4cf

File tree

21 files changed

+619
-431
lines changed

21 files changed

+619
-431
lines changed

alembic/README.md

Lines changed: 0 additions & 48 deletions
This file was deleted.

authentication/README.md

Lines changed: 0 additions & 78 deletions
This file was deleted.

docs/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This repository contains code and configurations for the AI-on-Demand Metadata Catalogue.
44
The metadata catalogue provides a unified view of AI assets and resources stored across the AI landscape.
5-
It collects metadata from platforms such as [_Zendodo_](https://zenodo.org) and [_OpenML_](https://openml.org),
5+
It collects metadata from platforms such as [_Zendodo_](https://zenodo.org), [_Hugging Face_](https://huggingface.co) and [_OpenML_](https://openml.org),
66
and is connected to European projects like [Bonsapps](https://bonsapps.eu) and [AIDA](https://www.i-aida.org).
77
Metadata of datasets, models, papers, news, and more from all of these sources is available through a REST API at [api.aiod.eu](https://api.aiod.eu/).
88

@@ -17,7 +17,7 @@ For documentation on how to use the REST API directly, visit the ["Using the API
1717
To use the metadata catalogue from your service, use the [Python SDK](https://github.com/aiondemand/aiondemand)
1818
or use the REST API directly as detailed in the ["Using the API"](Using.md) documentation.
1919

20-
**🌍 Hosting:** For information on how to host the metadata catalogue, see the ["Hosting" documentation](Hosting.md).
20+
**🌍 Hosting:** For information on how to host the metadata catalogue, see the ["Hosting" documentation](hosting/index.md).
2121

2222
**🧑‍🔧 API Development:** The ["Developer Guide"](developer/index.md) has information about the code in this repository and how to make contributions.
2323

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ Without it, it may be very hard for contributors to solve the issue (or may not
5757
## Setting up a development environment
5858

5959
### Cloning
60-
First, make sure you can get the local metadata catalogue up and running by following the ["Hosting" instructions](Hosting.md).
60+
First, make sure you can get the local metadata catalogue up and running by following the ["Hosting" instructions](hosting/index.md).
6161
During the installation step, use `git` to clone the repository.
6262
If you have write access to this repository, you can follow the instruction as-is.
6363
If you do not have write access to this repository, you must [fork it](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo).

docs/developer/auth.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/developer/authentication.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Authentication
2+
3+
For authentication, we use a [keycloak](https://www.keycloak.org) service.
4+
For development, make sure to use the `USE_LOCAL_DEV=true` environment variable so that the local
5+
keycloak server is configured with default users:
6+
7+
| User | Password | Role(s) | Comment |
8+
|------|----------|----------------------------------------------------------------------------|---------|
9+
| user | password | edit_aiod_resources, default-roles-aiod, offline_access, uma_authorization | |
10+
11+
For a description of the roles, see ["AIoD Keycloak Roles"](../hosting/authentication.md#roles).
12+
With the local development configuration, you will only be able to authenticate with keycloak users (OAuth2, password) not by other means.
13+
You can test authenication by e.g.,:
14+
15+
1. Navigate to the Swagger documentation (https://localhost:8000/docs)
16+
2. Click `Authorize`
17+
3. Navigate to "OpenIdConnect (OAuth2, password)" and provide the username and password.
18+
4. Click `Authorize`
19+
5. You should now be logged in. You can verify this by accessing an endpoint that requires authentication, such as `/authorization_test`.
20+
21+
## Connecting to Keycloak Console
22+
To connect to the Keycloak console, visit http://localhost/aiod-auth.
23+
In the development instance the administrator username is 'admin' and its password 'password'.

docs/developer/code.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/developer/elastic_search.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Elastic Search
2+
3+
Elastic Search indexes the information in the database for quick retrieval, facilitating endpoints
4+
that can search through assets with loosely matching queries and give relevancy-ranked suggestions.
5+
6+
## Indexing the Database using Logstash
7+
Elastic Search keeps independent indices for various assets in the database, and achieves this by:
8+
9+
* Creating an initial index
10+
* Populating it based on the information already in the database
11+
* Updating the index with new information periodically, removing old entries if necessary
12+
13+
Because the logic for these steps is very similar for the different assets, we generate various
14+
scripts to create and maintain the elastic search indices. We use [Logstash](https://www.elastic.co/logstash)
15+
to process the data from our database and export it to Elastic Search.
16+
17+
This happens through two container services:
18+
19+
* `es_logstash_setup`: Generates the common scripts for use by logstash, and creates the Elastic Search indices if necessary.
20+
This is a short-running service that only runs on startup, exiting when its done.
21+
* `logstash`: Continually monitors the database and updates the Elastic Search indices.
22+
23+
### Logstash Setup
24+
25+
The `es_logstash_setup` service executes two important roles: generating logstash files and creating Elastic Search indices.
26+
27+
The `src/logstash_setup/generate_logstash_config_files.py` file generates logstash files based on the
28+
templates provided in the `src/logstash_setup/templates` directory. The generated files are placed
29+
into subdirectories of the `logstash/config` directory, along with predefined files.
30+
31+
For syncing the Elastic Search index, logstash requires SQL files that extract the necessary data from the database.
32+
These are generated based on the `src/logstash_setup/templates/sql_{init|sync|rm}.py` files:
33+
34+
* The `sql_init.py` file defines the query template that finds the data that should be included in the index if it is populated from scratch.
35+
* The `sql_sync.py` file defines the query template that finds the data that has been updated since the last creation or synchronization, so that the ES index can be updated efficiently.
36+
* The `sql_rm.py` file defines the query template that finds the data that should be removed from the index.
37+
38+
It also generates the configuration files needed for Logstash to run the sync scripts:
39+
40+
* `config.py`: used to generate `logstash.yml`, the general configuration.
41+
* `init_table.py`: contains the configuration that is needed to run the queries from `sql_init.py`, and defines them for each asset that needs to be indexed.
42+
* `sync_table.py`: contains the configuration that is needed to run the queries from `sql_sync.py` and `sql_rm.py` scripts, and defines them for each asset that needs to be synced.
43+
44+
All generated files contain the preamble defined in `file_header.py`.
45+
Additionally, the `logstash/config/config` directory contains additional files used for the configuration of logstash, such as the JVM options.
46+
47+
### Creating a New Index
48+
To create a new index for an asset supported in the metadata catalogue REST API, you simply need to create the respective "search router", more on that below.
49+
50+
## Elastic Search in the Metadata Catalogue
51+
The metadata catalogue provides REST API endpoints to allow querying elastic search in a uniform manner.
52+
While the Elastic Search can be exposed directly in production, this unified endpoint allows us to provide more structure and better automated documentation.
53+
It also avoids requiring the user to learn the Elastic Search query format.
54+
55+
### Creating a New Search
56+
To extend Elastic Search to a new asset type, create a search router, similar to those in `src/routers/search_routers/`.
57+
Simply inherit from the base `SearchRouter` class defined in `src/routers/search_router.py` and define a few properties:
58+
59+
```python
60+
@property
61+
def es_index(self) -> str:
62+
return "case_study"
63+
```
64+
The `es_index` property defines the name of the index. It is how it is known by Elasic Search, and should match the name of the table in the database.
65+
66+
```python
67+
@property
68+
def resource_name_plural(self) -> str:
69+
return "case_studies"
70+
```
71+
72+
The `resource_name_plural` is used to define the path of the REST API endpoint, e.g.: `api.aiod.eu/search/case_studies`.
73+
74+
```python
75+
@property
76+
def resource_class(self):
77+
return CaseStudy
78+
```
79+
80+
The `resource_class` property contains a direct reference to the object it indexes, which is used when returning expanded responses from the ES query ("get all").
81+
82+
```python
83+
@property
84+
def extra_indexed_fields(self) -> set[str]:
85+
return {"headline", "alternative_headline"}
86+
```
87+
88+
The `extra_indexed_fields` property contains the fields of the entity that should be included in the index other than the `global_indexed_fields` found in the `SearchRouter` class.
89+
90+
```python
91+
@property
92+
def linked_fields(self) -> set[str]:
93+
return {
94+
"alternate_name",
95+
"application_area",
96+
"industrial_sector",
97+
"research_area",
98+
"scientific_domain",
99+
}
100+
```
101+
The `linked_fields` property contains the fields of the entity which refer to external tables and should be included in the index.
102+
103+
By creating a new `SearchRouter` (and adding it to the router list), the script which generates the logstash files will automatically include it.
104+
105+
## Configuration
106+
Besides the aforementioned configuration files, the elastic search configuration is located at `es/elasticsearch.yml`, but shouldn't need much configuration.
107+
Some aspects of both Logstash and Elastic Search are to be configured through environment variables through the `override.env` file (defaults in `.env`).
108+
Most notable one of these are the password for Elastic Search and the JVM resource options.

docs/developer/index.md

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ Checkin is strict - as it should be. On our development keycloak, any redirectio
202202
accepted, so that it works on local host or wherever you deploy. This should never be the case
203203
for a production instance.
204204

205-
See [authentication README](developer/auth.md) for more information.
205+
See [authentication README](authentication.md) for more information.
206206

207207
### Creating the Database
208208

@@ -239,15 +239,8 @@ start-up work (e.g., populating the database).
239239
The Python classes that define the database tables are found in [src/database/model/](../src/database/model/).
240240
The structure is based on the
241241
[metadata schema](https://github.com/aiondemand/metadata-schema).
242-
243-
244-
## Adding resources
245-
246-
See [src/README.md](developer/code.md).
242+
Updating the database schema is done using [Alembic](schema/migration.md).
247243

248244
## Backups and Restoration
249245

250-
We provide several scripts to facilitate the scheduling of backups and the manual restoration of files. For details on these scripts and others, please see [scripts/README.md](scripts/README.md).
251-
252-
## Releases
253-
246+
We provide several scripts to facilitate the scheduling of backups and the manual restoration of files. For details on these scripts and others, please see [scripts/README.md](scripts.md).

docs/developer/migration.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

0 commit comments

Comments
 (0)