-
Notifications
You must be signed in to change notification settings - Fork 79
Docs/connectors #427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Docs/connectors #427
Changes from 12 commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
7bbfc28
Extract own connectors page for hosting
PGijsbers 7d6664d
Add information on the different connectors
PGijsbers fc15737
Small note on synchronization
PGijsbers 7b2739a
Move readme form src to docs
PGijsbers 756700d
Document authentication
PGijsbers a85a737
Add warning Postman instructions are not up to date
PGijsbers 71bc720
Update database schema migration docs
PGijsbers cf1a422
Update documentation index
PGijsbers 71eb638
Clarify difference between local host/EIP
PGijsbers 52fc3e9
Fix indentation for tabbed examples
PGijsbers 4a03302
Update broken links
PGijsbers 58799a7
Add Logstash/ES documentation
PGijsbers b9f112b
Update docs/developer/elastic_search.md
PGijsbers 3f2df54
Update docs/developer/elastic_search.md
PGijsbers 50966f8
Add search plugin, mention Hugging Face
PGijsbers dafeb2a
Update index.md
mrorro a1824e3
Update mkdocs.yaml
mrorro e83c39c
Feedback Marco Rorro
PGijsbers File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
PGijsbers marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| # Authentication | ||
|
|
||
| For authentication, we use a [keycloak](https://www.keycloak.org) service. | ||
| For development, make sure to use the `USE_LOCAL_DEV=true` environment variable so that the local | ||
| keycloak server is configured with default users: | ||
|
|
||
| | User | Password | Role(s) | Comment | | ||
| |------|----------|----------------------------------------------------------------------------|---------| | ||
| | user | password | edit_aiod_resources, default-roles-aiod, offline_access, uma_authorization | | | ||
|
|
||
| For a description of the roles, see ["AIoD Keycloak Roles"](../hosting/authentication.md#roles). | ||
| With the local development configuration, you will only be able to authenticate with keycloak users (OAuth2, password) not by other means. | ||
| You can test authenication by e.g.,: | ||
|
|
||
| 1. Navigate to the Swagger documentation (https://localhost:8000/docs) | ||
| 2. Click `Authorize` | ||
| 3. Navigate to "OpenIdConnect (OAuth2, password)" and provide the username and password. | ||
| 4. Click `Authorize` | ||
| 5. You should now be logged in. You can verify this by accessing an endpoint that requires authentication, such as `/authorization_test`. | ||
|
|
||
| ## Connecting to Keycloak Console | ||
| To connect to the Keycloak console, visit http://localhost/aiod-auth. | ||
| In the development instance the administrator username is 'admin' and its password 'password'. |
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| # Elastic Search | ||
|
|
||
| Elastic Search indexes the information in the database for quick retrieval, facilitating endpoints | ||
| that can search through assets with loosely matching queries and give relevancy-ranked suggestions. | ||
|
|
||
| ## Indexing the Database using Logstash | ||
| Elastic Search keeps independent indices for various assets in the database, and achieves this by: | ||
|
|
||
| * Creating an initial index | ||
| * Populating it based on the information already in the database | ||
| * Updating the index with new information periodically, removing old entries if necessary | ||
|
|
||
| Because the logic for these steps is very similar for the different assets, we generate various | ||
| scripts to create and maintain the elastic search indices. We use [Logstash](https://www.elastic.co/logstash) | ||
| to process the data from our database and export it to Elastic Search. | ||
|
|
||
| This happens through two container services: | ||
|
|
||
| * `es_logstash_setup`: Generates the common scripts for use by logstash, and creates the Elastic Search indices if necessary. | ||
| This is a short-running services that only runs on startup, exiting when its done. | ||
PGijsbers marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * `logstash`: Continually monitors the database and updates the Elastic Search indices. | ||
|
|
||
| ### Logstash Setup | ||
|
|
||
| The `es_logstash_setup` service executes two important roles: generating logstash files and creating Elastic Search indices. | ||
|
|
||
| The `src/logstash_setup/generate_logstash_config_files.py` file generates logstash files based on the | ||
| templates provided in the `src/logstash_setup/templates` directory. The generated files are placed | ||
| into subdirectories of the `logstash/config` directory, along with predefined files. | ||
|
|
||
| For syncing the Elastic Search index, logstash requires SQL files that extract the necessary data from the database. | ||
| These are generated based on the `src/logstash_setup/templates/sql_{init|sync|rm}.py` files: | ||
|
|
||
| * The `sql_init.py` file defines the query template that finds the data that should be included in the index if it is populated from scratch. | ||
| * The `sql_sync.py` file defines the query template that finds the data that has been updated since the last creation or synchronization, so that the ES index can be updated efficiently. | ||
| * The `sql_rm.py` file defines the query template that finds the data that should be removed from the index. | ||
|
|
||
| It also generates the configuration files needed for Logstash to run the sync scripts: | ||
|
|
||
| * `config.py`: used to generate `logstash.yml`, the general configuration. | ||
| * `init_table.py`: contains the configuration that is needed to run the queries from `sql_init.py`, and defines them for each asset that needs to be indexed. | ||
| * `sync_table.py`: contains the configuration that is needed to run the queries from `sql_sync.py` and `sql_rm.py` scripts, and defines them for each asset that needs to be synced. | ||
|
|
||
| All generated files contain the preamble defined in `file_header.py`. | ||
| Additionally, the `logstash/config/config` directory contains additional files used for the configuration of logstash, such as the JVM options. | ||
|
|
||
| ### Creating a New Index | ||
| To create a new index for an asset supported in the metadata catalogue REST API, you simply need to create the respective "search router", more on that below. | ||
|
|
||
| ## Elastic Search in the Metadata Catalogue | ||
| The metadata catalogue provides REST API endpoints to allow querying elastic search in a uniform manner. | ||
| While the Elastic Search can be exposed directly in production, this unified endpoint allows us to provide more structure and better automated documentation. | ||
| It also avoids requiring the user to learn the Elastic Search query format. | ||
|
|
||
| ### Creating a New Search | ||
| To extend Elastic Search to a new asset type, create a search router, similar to those in `src/routers/search_routers/`. | ||
| Simply inherit from the base `SearchRouter` class defined in `src/routers/search_router.py` and define a few properties: | ||
|
|
||
| ```python | ||
| @property | ||
| def es_index(self) -> str: | ||
| return "case_study" | ||
| ``` | ||
| The `es_index` property defines the name of the index. It is how it is known by Elasic Search, and should match the name of the table in the database. | ||
|
|
||
| ```python | ||
| @property | ||
| def resource_name_plural(self) -> str: | ||
| return "case_studies" | ||
| ``` | ||
|
|
||
| The `resource_name_plural` is used to define the path of the REST API endpoint, e.g.: `api.aiod.eu/search/case_studies`. | ||
|
|
||
| ```python | ||
| @property | ||
| def resource_class(self): | ||
| return CaseStudy | ||
| ``` | ||
|
|
||
| The `resource_class` property contains a direct reference to the object it indexes, which is used when returning expanded responses from the ES query ("get all"). | ||
|
|
||
PGijsbers marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ```python | ||
| @property | ||
| def linked_fields(self) -> set[str]: | ||
| return { | ||
| "alternate_name", | ||
| "application_area", | ||
| "industrial_sector", | ||
| "research_area", | ||
| "scientific_domain", | ||
| } | ||
| ``` | ||
| The `linked_fields` property contains the fields of the entity which refer to external tables and should be included in the index. | ||
|
|
||
| By creating a new `SearchRouter` (and adding it to the router list), the script which generates the logstash files will automatically include it. | ||
|
|
||
| ## Configuration | ||
| Besides the aforementioned configuration files, the elastic search configuration is located at `es/elasticsearch.yml`, but shouldn't need much configuration. | ||
| Some aspects of both Logstash and Elastic Search are to be configured through environment variables through the `override.env` file (defaults in `.env`). | ||
| Most notable one of these are the password for Elastic Search and the JVM resource options. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.