Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,27 @@
# Changelog
## Unreleased
:boom: **Breaking**
- Change mount point of `update-handler.store` to `/data` instead of `/config`
- Change mount point of files to `/share` instead of `/data`

To upgrade execute the following steps in your project folder
```bash
mkdir -p ./data/search
mv ./config/search/update-handler.store ./data/search
```

Next, update the mount points of the `search` service
```yaml
services:
search:
image: semtech/mu-search
volumes:
- ./config/search:/config
- ./data/search:/data
- ./data/files:/share # in case your index files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo here

Suggested change
- ./data/files:/share # in case your index files
- ./data/files:/share # in case you index files

- ./data/tika/cache:/cache # in case your index files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ./data/tika/cache:/cache # in case your index files
- ./data/tika/cache:/cache # in case you index files

```
**Features**
- added ignored groups: groups that should not be taken into account when searching
Expand Down
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@ services:
- db:database
volumes:
- ./config/search:/config
- ./data/search:/data
elasticsearch:
image: semtech/mu-search-elastic-backend:1.1.0
volumes:
- ./data/elasticsearch/:/usr/share/elasticsearch/data
environment:
- discovery.type=single-node

```

The indices will be persisted in `./data/elasticsearch`. The `search` service needs to be linked to an instance of the [mu-authorization](https://github.com/mu-semtech/mu-authorization) service.
Expand Down Expand Up @@ -216,7 +216,7 @@ services:
```

Next, add the following mounted volumes to the mu-search service in `docker-compose.yml`:
- `/data`: folder containing the files to be indexed
- `/share`: folder containing the files to be indexed
- `/cache`: folder to persist Tika's search cache

```yml
Expand All @@ -225,8 +225,9 @@ services:
image: semtech/mu-search:0.10.0
volumes:
- ./config/search:/config
- ./data/files:/data
- ./data/search/cache:/cache
- ./data/search:/data
- ./data/files:/share
- ./data/tika/cache:/cache
```

Next, add a property `files` in the `project` type index configuration. The property `files` will hold the content and metadata of the files.
Expand Down Expand Up @@ -466,7 +467,7 @@ These objects are structured in the same way as the `attachment` objects resulti
}
```

Currently, only indexing of local files is supported. The files' logical path as well as other metadata is expected to be in the format specified by the [file-service](https://github.com/mu-semtech/file-service#data-model). Files must be present in the Docker volume `/data` inside the container.
Currently, only indexing of local files is supported. The files' logical path as well as other metadata is expected to be in the format specified by the [file-service](https://github.com/mu-semtech/file-service#data-model). Files must be present in the Docker volume `/share` inside the container.

Attachments processed by Tika are cached in the directory `/cache` (by SHA256 of the file contents). This must be defined as a shared volume for the cache to be persistent.

Expand Down Expand Up @@ -1045,7 +1046,7 @@ This section gives an overview of all configurable options in the search configu
- (*) **update_wait_interval_minutes** : number of minutes to wait before applying an update. Allows to prevent duplicate updates of the same documents. Defaults to 1.
- (*) **common_terms_cutoff_frequency** : default cutoff frequency for a [Common terms query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-common-terms-query.html). Defaults to 0.0001. See [supported search methods](#supported-search-methods).
- (*) **enable_raw_dsl_endpoint** : flag to enable the [raw Elasticsearch DSL endpoint](#api). This endpoint is disabled by default for security reasons.
- (*) **attachments_path_base** : path inside the Docker container where files for the attachment pipeline are mounted. Defaults to `/data`.
- (*) **attachments_path_base** : path inside the Docker container where files for the attachment pipeline are mounted. Defaults to `/share`.

All options prefixed with (*) can also be configured using an UPPERCASED variant as Docker environment variables on the mu-search container. E.g. the `batch_size` option can be set via the environment variable `BATCH_SIZE`. Environment variables take precedence over settings configured in `config.json`.

Expand Down
2 changes: 1 addition & 1 deletion lib/mu_search/config_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ def self.parse(path)
default_configuration = {
batch_size: 100,
common_terms_cutoff_frequency: 0.001,
attachment_path_base: "/data",
attachment_path_base: "/share",
eager_indexing_groups: [],
update_wait_interval_minutes: 1,
number_of_threads: 1,
Expand Down
2 changes: 1 addition & 1 deletion lib/mu_search/update_handler.rb
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ def setup_runners

# Initializes the update queue and ensures the queue is persisted on disk at regular intervals
def restore_queue_and_setup_persistence
@store = YAML::Store.new("/config/update-handler.store", true)
@store = YAML::Store.new("/data/update-handler.store", true)
@store.transaction do
@queue = @store.fetch("queue", [])
@subject_map = @subject_map.merge(@store.fetch("index", {}))
Expand Down