Elasticsearch

Detailed insight on how the Elasticsearch database is used in eegportal

Integration with spring

Accomplished by https://github.com/spring-projects/spring-data-elasticsearch.
Now running on elasticsearch v0.90.9. Hopefully, before my job is done, there will be an elasticsearch 1.0-Final release and we will have database + driver running on this version. Right now, there is a maven dependency 1.0-SNAPSHOT (for spring-data-elasticsearch) which gets updated daily and updates can break things up. In case they do not publish a stable maven release before end of my work, the jar will be placed to a local repo so we can freeze the driver version.
Connection strings are stored in WEB-INF/project.properties. * Beans related to elasticsearch are defined in WEB-INF/persistence.xml

<elasticsearch:transport-client id="client" cluster-name="${elasticsearch.clusterName}" cluster-nodes="${elasticsearch.url}" />
<bean name="elasticsearchTemplate" class="org.springframework.data.elasticsearch.core.ElasticsearchTemplate">
  <constructor-arg name="client" ref="client"/>
</bean>	
<elasticsearch:repositories base-package="cz.zcu.kiv.eegdatabase.data.nosql.repositories" />

How are elasticsearch and hibernate entities connected

todo (interceptor, Experiment+ExperimentElastic classes...)

## Database setup

Development database now runs on eeg2.kiv.zcu.cz with HEAD plugin installed (https://github.com/mobz/elasticsearch-head)
One non-replicated node runs on this server with standard five shards. There is (obviously) custom mapping defined like:

POST http://eeg2.kiv.zcu.cz:9200/eegportal

{
  "settings": {
    "index": {
      "number_of_shards": 5,
      "number_of_replicas": 0
    },
    "analysis": {
      "analyzer": {
        "experiment_param_value_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "stop",
            "asciifolding",
            "edge_ngram"
          ]
        }
      },
      "filter": {
        "edge_ngram": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 20
        }
      }
    }
  }
}

and then mapping for experiment type

PUT http://eeg2.kiv.zcu.cz:9200/eegportal/experiment/_mapping

{
  "experiment": {
    "properties": {
      "experimentId": {
        "type": "string"
      },
      "groupId": {
        "type": "integer"
      },
      "params": {
        "type": "nested",
        "properties": {
          "attributes": {
            "type": "nested",
            "properties": {
              "name": {
                "type": "string"
              },
              "value": {
                "type": "string",
                "index_analyzer": "experiment_param_value_analyzer"
              }
            }
          },
          "name": {
            "type": "string"
          },
          "valueInteger": {
            "type": "integer"
          },
          "valueString": {
            "type": "string",
            "index_analyzer": "experiment_param_value_analyzer"
          }
        }
      }
    }
  }
}

Those mappings and custom analyzers MUST BE SET BEFORE anything is inserted into ES (into specific index). Changes to index mappings or adding custom analyzers on a running cluster is a problem and usually all data has to be reindexed (= DELETED and application must insert them somehow again). Elasticsearch doesn't handle this internally. It is one of the most annoying things, but it is as it is, just get over it :)

Elasticsearch migration (synchronization)

Ive heard there are some teams working on some kind of offline sync/backup/whatever. This might come handy:
- https://github.com/jprante/elasticsearch-knapsack
- https://github.com/crate/elasticsearch-inout-plugin
- https://github.com/mallocator/Elasticsearch-Exporter
- http://www.elasticsearch.org/blog/introducing-snapshot-restore/
I have personally tested the Elasticsearch-Exporter (a nodejs module that dumps whole index into one file that can be re-imported on another machine) - useful during changes in mapping and reindexing whole data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch

Integration with spring

How are elasticsearch and hibernate entities connected

Elasticsearch migration (synchronization)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally