aiondemand · PGijsbers · Feb 6, 2025 · Dec 23, 2024 · Dec 23, 2024 · Dec 23, 2024
diff --git a/alembic/README.md b/alembic/README.md
diff --git a/authentication/README.md b/authentication/README.md
diff --git a/docs/README.md b/docs/README.md
@@ -17,7 +17,7 @@ For documentation on how to use the REST API directly, visit the ["Using the API
 To use the metadata catalogue from your service, use the [Python SDK](https://github.com/aiondemand/aiondemand)
 or use the REST API directly as detailed in the ["Using the API"](Using.md) documentation.
 
-**🌍 Hosting:** For information on how to host the metadata catalogue, see the ["Hosting" documentation](Hosting.md).
+**🌍 Hosting:** For information on how to host the metadata catalogue, see the ["Hosting" documentation](hosting/index.md).
 
 **🧑‍🔧 API Development:** The ["Developer Guide"](developer/index.md) has information about the code in this repository and how to make contributions.
 

diff --git a/docs/Contributing.md → docs/contributing.md b/docs/Contributing.md → docs/contributing.md
@@ -57,7 +57,7 @@ Without it, it may be very hard for contributors to solve the issue (or may not
 ## Setting up a development environment
 
 ### Cloning
-First, make sure you can get the local metadata catalogue up and running by following the ["Hosting" instructions](Hosting.md).
+First, make sure you can get the local metadata catalogue up and running by following the ["Hosting" instructions](hosting/index.md).
 During the installation step, use `git` to clone the repository.
 If you have write access to this repository, you can follow the instruction as-is.
 If you do not have write access to this repository, you must [fork it](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo).

diff --git a/docs/developer/auth.md b/docs/developer/auth.md
diff --git a/docs/developer/authentication.md b/docs/developer/authentication.md
@@ -0,0 +1,23 @@
+# Authentication
+
+For authentication, we use a [keycloak](https://www.keycloak.org) service.
+For development, make sure to use the `USE_LOCAL_DEV=true` environment variable so that the local
+keycloak server is configured with default users:
+
+| User | Password | Role(s)                                                                    | Comment |
+|------|----------|----------------------------------------------------------------------------|---------|
+| user | password | edit_aiod_resources, default-roles-aiod, offline_access, uma_authorization |         |
+
+For a description of the roles, see ["AIoD Keycloak Roles"](../hosting/authentication.md#roles).
+With the local development configuration, you will only be able to authenticate with keycloak users (OAuth2, password) not by other means.
+You can test authenication by e.g.,:
+
+1. Navigate to the Swagger documentation (https://localhost:8000/docs)
+2. Click `Authorize`
+3. Navigate to "OpenIdConnect (OAuth2, password)" and provide the username and password.
+4. Click `Authorize`
+5. You should now be logged in. You can verify this by accessing an endpoint that requires authentication, such as `/authorization_test`.
+
+## Connecting to Keycloak Console
+To connect to the Keycloak console, visit http://localhost/aiod-auth. 
+In the development instance the administrator username is 'admin' and its password 'password'.
diff --git a/docs/developer/code.md b/docs/developer/code.md
diff --git a/docs/developer/elastic_search.md b/docs/developer/elastic_search.md
@@ -0,0 +1,100 @@
+# Elastic Search
+
+Elastic Search indexes the information in the database for quick retrieval, facilitating endpoints 
+that can search through assets with loosely matching queries and give relevancy-ranked suggestions.
+
+## Indexing the Database using Logstash
+Elastic Search keeps independent indices for various assets in the database, and achieves this by:
+
+ * Creating an initial index
+ * Populating it based on the information already in the database
+ * Updating the index with new information periodically, removing old entries if necessary
+
+Because the logic for these steps is very similar for the different assets, we generate various
+scripts to create and maintain the elastic search indices. We use [Logstash](https://www.elastic.co/logstash)
+to process the data from our database and export it to Elastic Search.
+
+This happens through two container services: 
+
+ * `es_logstash_setup`: Generates the common scripts for use by logstash, and creates the Elastic Search indices if necessary.
+    This is a short-running services that only runs on startup, exiting when its done.
+ * `logstash`: Continually monitors the database and updates the Elastic Search indices.
+
+### Logstash Setup
+
+The `es_logstash_setup` service executes two important roles: generating logstash files and creating Elastic Search indices.
+
+The `src/logstash_setup/generate_logstash_config_files.py` file generates logstash files based on the
+templates provided in the `src/logstash_setup/templates` directory. The generated files are placed
+into subdirectories of the `logstash/config` directory, along with predefined files.
+
+For syncing the Elastic Search index, logstash requires SQL files that extract the necessary data from the database.
+These are generated based on the `src/logstash_setup/templates/sql_{init|sync|rm}.py` files:
+
+ * The `sql_init.py` file defines the query template that finds the data that should be included in the index if it is populated from scratch.
+ * The `sql_sync.py` file defines the query template that finds the data that has been updated since the last creation or synchronization, so that the ES index can be updated efficiently.
+ * The `sql_rm.py` file defines the query template that finds the data that should be removed from the index.
+
+It also generates the configuration files needed for Logstash to run the sync scripts:
+
+ * `config.py`: used to generate `logstash.yml`, the general configuration.
+ * `init_table.py`: contains the configuration that is needed to run the queries from `sql_init.py`, and defines them for each asset that needs to be indexed.
+ * `sync_table.py`: contains the configuration that is needed to run the queries from `sql_sync.py` and `sql_rm.py` scripts, and defines them for each asset that needs to be synced.
+
+All generated files contain the preamble defined in `file_header.py`.
+Additionally, the `logstash/config/config` directory contains additional files used for the configuration of logstash, such as the JVM options.
+
+### Creating a New Index
+To create a new index for an asset supported in the metadata catalogue REST API, you simply need to create the respective "search router", more on that below.
+
+## Elastic Search in the Metadata Catalogue
+The metadata catalogue provides REST API endpoints to allow querying elastic search in a uniform manner.
+While the Elastic Search can be exposed directly in production, this unified endpoint allows us to provide more structure and better automated documentation.
+It also avoids requiring the user to learn the Elastic Search query format.
+
+### Creating a New Search
+To extend  Elastic Search to a new asset type, create a search router, similar to those in `src/routers/search_routers/`.
+Simply inherit from the base `SearchRouter` class defined in `src/routers/search_router.py` and define a few properties:
+
+```python 
+    @property
+    def es_index(self) -> str:
+        return "case_study"
+```
+The `es_index` property defines the name of the index. It is how it is known by Elasic Search, and should match the name of the table in the database.
+
+```python
+    @property
+    def resource_name_plural(self) -> str:
+        return "case_studies"
+```
+
+The `resource_name_plural` is used to define the path of the REST API endpoint, e.g.: `api.aiod.eu/search/case_studies`.
+
+```python
+@property
+def resource_class(self):
+    return CaseStudy
+```
+
+The `resource_class` property contains a direct reference to the object it indexes, which is used when returning expanded responses from the ES query ("get all").
+
+```python
+    @property
+    def linked_fields(self) -> set[str]:
+        return {
+            "alternate_name",
+            "application_area",
+            "industrial_sector",
+            "research_area",
+            "scientific_domain",
+        }
+```
+The `linked_fields` property contains the fields of the entity which refer to external tables and should be included in the index.
+
+By creating a new `SearchRouter` (and adding it to the router list), the script which generates the logstash files will automatically include it.
+
+## Configuration
+Besides the aforementioned configuration files, the elastic search configuration is located at `es/elasticsearch.yml`, but shouldn't need much configuration.
+Some aspects of both Logstash and Elastic Search are to be configured through environment variables through the `override.env` file (defaults in `.env`).
+Most notable one of these are the password for Elastic Search and the JVM resource options.
diff --git a/docs/developer/index.md b/docs/developer/index.md
@@ -202,7 +202,7 @@ Checkin is strict - as it should be. On our development keycloak, any redirectio
 accepted, so that it works on local host or wherever you deploy. This should never be the case 
 for a production instance.
 
-See [authentication README](developer/auth.md) for more information.
+See [authentication README](authentication.md) for more information.
 
 ### Creating the Database
 
@@ -239,15 +239,8 @@ start-up work (e.g., populating the database).
 The Python classes that define the database tables are found in [src/database/model/](../src/database/model/). 
 The structure is based on the 
 [metadata schema](https://github.com/aiondemand/metadata-schema).
-
-
-## Adding resources
-
-See [src/README.md](developer/code.md).
+Updating the database schema is done using [Alembic](schema/migration.md).
 
 ## Backups and Restoration
 
-We provide several scripts to facilitate the scheduling of backups and the manual restoration of files. For details on these scripts and others, please see [scripts/README.md](scripts/README.md).
-
-## Releases
-
+We provide several scripts to facilitate the scheduling of backups and the manual restoration of files. For details on these scripts and others, please see [scripts/README.md](scripts.md).
diff --git a/docs/developer/migration.md b/docs/developer/migration.md
diff --git a/docs/developer/releases.md b/docs/developer/releases.md
@@ -59,5 +59,5 @@ To create a new release,
     - Check which services currently work (before the update). It's a sanity check for if a service _doesn't_ work later.
     - Update the code on the server by checking out the release
     - Merge configurations as necessary
-    - Make sure the latest database migrations are applied: see ["Schema Migrations"](developer/migration.md#update-the-database)
+    - Make sure the latest database migrations are applied: see ["Schema Migrations"](schema/migration.md#update-the-database)
 9. Notify everyone (e.g., in the API channel in Slack). 
diff --git a/docs/developer/schema/index.md b/docs/developer/schema/index.md
@@ -35,7 +35,7 @@ On a high level, changes to the metadata schema implementation consist of three
 
  * updating the schema implementation in [`src/database/model`](https://github.com/aiondemand/AIOD-rest-api/tree/develop/src/database/model),
  * updating or adding tests which test those changes, and
- * adding a [database migration script]() which updates the database accordingly.
+ * adding a [database migration script](migration.md) which updates the database accordingly.
 
 This last step isn't needed during development, where you may recreate a database anytime to model changes.
 However, to deploy the changed schema in production we need to be able to change the database,