diff --git a/CHANGELOG.md b/CHANGELOG.md index 231faa7..f94f586 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,27 @@ # Changelog ## Unreleased +:boom: **Breaking** +- Change mount point of `update-handler.store` to `/data` instead of `/config` +- Change mount point of files to `/share` instead of `/data` + +To upgrade execute the following steps in your project folder +```bash +mkdir -p ./data/search +mv ./config/search/update-handler.store ./data/search +``` + +Next, update the mount points of the `search` service +```yaml +services: + search: + image: semtech/mu-search + volumes: + - ./config/search:/config + - ./data/search:/data + - ./data/files:/share # in case your index files + - ./data/tika/cache:/cache # in case your index files +``` + **Features** - added ignored groups: groups that should not be taken into account when searching diff --git a/README.md b/README.md index 74635fb..71211e3 100644 --- a/README.md +++ b/README.md @@ -20,13 +20,13 @@ services: - db:database volumes: - ./config/search:/config + - ./data/search:/data elasticsearch: image: semtech/mu-search-elastic-backend:1.1.0 volumes: - ./data/elasticsearch/:/usr/share/elasticsearch/data environment: - discovery.type=single-node - ``` The indices will be persisted in `./data/elasticsearch`. The `search` service needs to be linked to an instance of the [mu-authorization](https://github.com/mu-semtech/mu-authorization) service. @@ -216,7 +216,7 @@ services: ``` Next, add the following mounted volumes to the mu-search service in `docker-compose.yml`: -- `/data`: folder containing the files to be indexed +- `/share`: folder containing the files to be indexed - `/cache`: folder to persist Tika's search cache ```yml @@ -225,8 +225,9 @@ services: image: semtech/mu-search:0.10.0 volumes: - ./config/search:/config - - ./data/files:/data - - ./data/search/cache:/cache + - ./data/search:/data + - ./data/files:/share + - ./data/tika/cache:/cache ``` Next, add a property `files` in the `project` type index configuration. The property `files` will hold the content and metadata of the files. @@ -466,7 +467,7 @@ These objects are structured in the same way as the `attachment` objects resulti } ``` -Currently, only indexing of local files is supported. The files' logical path as well as other metadata is expected to be in the format specified by the [file-service](https://github.com/mu-semtech/file-service#data-model). Files must be present in the Docker volume `/data` inside the container. +Currently, only indexing of local files is supported. The files' logical path as well as other metadata is expected to be in the format specified by the [file-service](https://github.com/mu-semtech/file-service#data-model). Files must be present in the Docker volume `/share` inside the container. Attachments processed by Tika are cached in the directory `/cache` (by SHA256 of the file contents). This must be defined as a shared volume for the cache to be persistent. @@ -1045,7 +1046,7 @@ This section gives an overview of all configurable options in the search configu - (*) **update_wait_interval_minutes** : number of minutes to wait before applying an update. Allows to prevent duplicate updates of the same documents. Defaults to 1. - (*) **common_terms_cutoff_frequency** : default cutoff frequency for a [Common terms query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-common-terms-query.html). Defaults to 0.0001. See [supported search methods](#supported-search-methods). - (*) **enable_raw_dsl_endpoint** : flag to enable the [raw Elasticsearch DSL endpoint](#api). This endpoint is disabled by default for security reasons. -- (*) **attachments_path_base** : path inside the Docker container where files for the attachment pipeline are mounted. Defaults to `/data`. +- (*) **attachments_path_base** : path inside the Docker container where files for the attachment pipeline are mounted. Defaults to `/share`. All options prefixed with (*) can also be configured using an UPPERCASED variant as Docker environment variables on the mu-search container. E.g. the `batch_size` option can be set via the environment variable `BATCH_SIZE`. Environment variables take precedence over settings configured in `config.json`. diff --git a/lib/mu_search/config_parser.rb b/lib/mu_search/config_parser.rb index 5490b15..1a6759b 100644 --- a/lib/mu_search/config_parser.rb +++ b/lib/mu_search/config_parser.rb @@ -12,7 +12,7 @@ def self.parse(path) default_configuration = { batch_size: 100, common_terms_cutoff_frequency: 0.001, - attachment_path_base: "/data", + attachment_path_base: "/share", eager_indexing_groups: [], update_wait_interval_minutes: 1, number_of_threads: 1, diff --git a/lib/mu_search/update_handler.rb b/lib/mu_search/update_handler.rb index ddaa26c..85ec9dc 100644 --- a/lib/mu_search/update_handler.rb +++ b/lib/mu_search/update_handler.rb @@ -121,7 +121,7 @@ def setup_runners # Initializes the update queue and ensures the queue is persisted on disk at regular intervals def restore_queue_and_setup_persistence - @store = YAML::Store.new("/config/update-handler.store", true) + @store = YAML::Store.new("/data/update-handler.store", true) @store.transaction do @queue = @store.fetch("queue", []) @subject_map = @subject_map.merge(@store.fetch("index", {}))