Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 0 additions & 48 deletions alembic/README.md

This file was deleted.

78 changes: 0 additions & 78 deletions authentication/README.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ For documentation on how to use the REST API directly, visit the ["Using the API
To use the metadata catalogue from your service, use the [Python SDK](https://github.com/aiondemand/aiondemand)
or use the REST API directly as detailed in the ["Using the API"](Using.md) documentation.

**🌍 Hosting:** For information on how to host the metadata catalogue, see the ["Hosting" documentation](Hosting.md).
**🌍 Hosting:** For information on how to host the metadata catalogue, see the ["Hosting" documentation](hosting/index.md).

**🧑‍🔧 API Development:** The ["Developer Guide"](developer/index.md) has information about the code in this repository and how to make contributions.

Expand Down
2 changes: 1 addition & 1 deletion docs/Contributing.md → docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Without it, it may be very hard for contributors to solve the issue (or may not
## Setting up a development environment

### Cloning
First, make sure you can get the local metadata catalogue up and running by following the ["Hosting" instructions](Hosting.md).
First, make sure you can get the local metadata catalogue up and running by following the ["Hosting" instructions](hosting/index.md).
During the installation step, use `git` to clone the repository.
If you have write access to this repository, you can follow the instruction as-is.
If you do not have write access to this repository, you must [fork it](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo).
Expand Down
3 changes: 0 additions & 3 deletions docs/developer/auth.md

This file was deleted.

23 changes: 23 additions & 0 deletions docs/developer/authentication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Authentication

For authentication, we use a [keycloak](https://www.keycloak.org) service.
For development, make sure to use the `USE_LOCAL_DEV=true` environment variable so that the local
keycloak server is configured with default users:

| User | Password | Role(s) | Comment |
|------|----------|----------------------------------------------------------------------------|---------|
| user | password | edit_aiod_resources, default-roles-aiod, offline_access, uma_authorization | |

For a description of the roles, see ["AIoD Keycloak Roles"](../hosting/authentication.md#roles).
With the local development configuration, you will only be able to authenticate with keycloak users (OAuth2, password) not by other means.
You can test authenication by e.g.,:

1. Navigate to the Swagger documentation (https://localhost:8000/docs)
2. Click `Authorize`
3. Navigate to "OpenIdConnect (OAuth2, password)" and provide the username and password.
4. Click `Authorize`
5. You should now be logged in. You can verify this by accessing an endpoint that requires authentication, such as `/authorization_test`.

## Connecting to Keycloak Console
To connect to the Keycloak console, visit http://localhost/aiod-auth.
In the development instance the administrator username is 'admin' and its password 'password'.
3 changes: 0 additions & 3 deletions docs/developer/code.md

This file was deleted.

100 changes: 100 additions & 0 deletions docs/developer/elastic_search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Elastic Search

Elastic Search indexes the information in the database for quick retrieval, facilitating endpoints
that can search through assets with loosely matching queries and give relevancy-ranked suggestions.

## Indexing the Database using Logstash
Elastic Search keeps independent indices for various assets in the database, and achieves this by:

* Creating an initial index
* Populating it based on the information already in the database
* Updating the index with new information periodically, removing old entries if necessary

Because the logic for these steps is very similar for the different assets, we generate various
scripts to create and maintain the elastic search indices. We use [Logstash](https://www.elastic.co/logstash)
to process the data from our database and export it to Elastic Search.

This happens through two container services:

* `es_logstash_setup`: Generates the common scripts for use by logstash, and creates the Elastic Search indices if necessary.
This is a short-running services that only runs on startup, exiting when its done.
* `logstash`: Continually monitors the database and updates the Elastic Search indices.

### Logstash Setup

The `es_logstash_setup` service executes two important roles: generating logstash files and creating Elastic Search indices.

The `src/logstash_setup/generate_logstash_config_files.py` file generates logstash files based on the
templates provided in the `src/logstash_setup/templates` directory. The generated files are placed
into subdirectories of the `logstash/config` directory, along with predefined files.

For syncing the Elastic Search index, logstash requires SQL files that extract the necessary data from the database.
These are generated based on the `src/logstash_setup/templates/sql_{init|sync|rm}.py` files:

* The `sql_init.py` file defines the query template that finds the data that should be included in the index if it is populated from scratch.
* The `sql_sync.py` file defines the query template that finds the data that has been updated since the last creation or synchronization, so that the ES index can be updated efficiently.
* The `sql_rm.py` file defines the query template that finds the data that should be removed from the index.

It also generates the configuration files needed for Logstash to run the sync scripts:

* `config.py`: used to generate `logstash.yml`, the general configuration.
* `init_table.py`: contains the configuration that is needed to run the queries from `sql_init.py`, and defines them for each asset that needs to be indexed.
* `sync_table.py`: contains the configuration that is needed to run the queries from `sql_sync.py` and `sql_rm.py` scripts, and defines them for each asset that needs to be synced.

All generated files contain the preamble defined in `file_header.py`.
Additionally, the `logstash/config/config` directory contains additional files used for the configuration of logstash, such as the JVM options.

### Creating a New Index
To create a new index for an asset supported in the metadata catalogue REST API, you simply need to create the respective "search router", more on that below.

## Elastic Search in the Metadata Catalogue
The metadata catalogue provides REST API endpoints to allow querying elastic search in a uniform manner.
While the Elastic Search can be exposed directly in production, this unified endpoint allows us to provide more structure and better automated documentation.
It also avoids requiring the user to learn the Elastic Search query format.

### Creating a New Search
To extend Elastic Search to a new asset type, create a search router, similar to those in `src/routers/search_routers/`.
Simply inherit from the base `SearchRouter` class defined in `src/routers/search_router.py` and define a few properties:

```python
@property
def es_index(self) -> str:
return "case_study"
```
The `es_index` property defines the name of the index. It is how it is known by Elasic Search, and should match the name of the table in the database.

```python
@property
def resource_name_plural(self) -> str:
return "case_studies"
```

The `resource_name_plural` is used to define the path of the REST API endpoint, e.g.: `api.aiod.eu/search/case_studies`.

```python
@property
def resource_class(self):
return CaseStudy
```

The `resource_class` property contains a direct reference to the object it indexes, which is used when returning expanded responses from the ES query ("get all").

```python
@property
def linked_fields(self) -> set[str]:
return {
"alternate_name",
"application_area",
"industrial_sector",
"research_area",
"scientific_domain",
}
```
The `linked_fields` property contains the fields of the entity which refer to external tables and should be included in the index.

By creating a new `SearchRouter` (and adding it to the router list), the script which generates the logstash files will automatically include it.

## Configuration
Besides the aforementioned configuration files, the elastic search configuration is located at `es/elasticsearch.yml`, but shouldn't need much configuration.
Some aspects of both Logstash and Elastic Search are to be configured through environment variables through the `override.env` file (defaults in `.env`).
Most notable one of these are the password for Elastic Search and the JVM resource options.
13 changes: 3 additions & 10 deletions docs/developer/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ Checkin is strict - as it should be. On our development keycloak, any redirectio
accepted, so that it works on local host or wherever you deploy. This should never be the case
for a production instance.

See [authentication README](developer/auth.md) for more information.
See [authentication README](authentication.md) for more information.

### Creating the Database

Expand Down Expand Up @@ -239,15 +239,8 @@ start-up work (e.g., populating the database).
The Python classes that define the database tables are found in [src/database/model/](../src/database/model/).
The structure is based on the
[metadata schema](https://github.com/aiondemand/metadata-schema).


## Adding resources

See [src/README.md](developer/code.md).
Updating the database schema is done using [Alembic](schema/migration.md).

## Backups and Restoration

We provide several scripts to facilitate the scheduling of backups and the manual restoration of files. For details on these scripts and others, please see [scripts/README.md](scripts/README.md).

## Releases

We provide several scripts to facilitate the scheduling of backups and the manual restoration of files. For details on these scripts and others, please see [scripts/README.md](scripts.md).
3 changes: 0 additions & 3 deletions docs/developer/migration.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/developer/releases.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,5 +59,5 @@ To create a new release,
- Check which services currently work (before the update). It's a sanity check for if a service _doesn't_ work later.
- Update the code on the server by checking out the release
- Merge configurations as necessary
- Make sure the latest database migrations are applied: see ["Schema Migrations"](developer/migration.md#update-the-database)
- Make sure the latest database migrations are applied: see ["Schema Migrations"](schema/migration.md#update-the-database)
9. Notify everyone (e.g., in the API channel in Slack).
2 changes: 1 addition & 1 deletion docs/developer/schema/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ On a high level, changes to the metadata schema implementation consist of three

* updating the schema implementation in [`src/database/model`](https://github.com/aiondemand/AIOD-rest-api/tree/develop/src/database/model),
* updating or adding tests which test those changes, and
* adding a [database migration script]() which updates the database accordingly.
* adding a [database migration script](migration.md) which updates the database accordingly.

This last step isn't needed during development, where you may recreate a database anytime to model changes.
However, to deploy the changed schema in production we need to be able to change the database,
Expand Down
Loading