Skip to content

Elasticsearch

bydga edited this page Feb 2, 2014 · 39 revisions
  • Detailed insight on how the Elasticsearch database is used in eegportal

Integration with spring

  • Accomplished by https://github.com/spring-projects/spring-data-elasticsearch.

  • Now running on elasticsearch v0.90.9. Hopefully, before my job is done, there will be an elasticsearch 1.0-Final release and we will have database + driver running on this version. Right now, there is a maven dependency 1.0-SNAPSHOT (for spring-data-elasticsearch) which gets updated daily and updates can break things up. In case they do not publish a stable maven release before end of my work, the jar will be placed to a local repo so we can freeze the driver version.

  • Connection strings are stored in WEB-INF/project.properties. * Beans related to elasticsearch are defined in WEB-INF/persistence.xml

<elasticsearch:transport-client id="client" cluster-name="${elasticsearch.clusterName}" cluster-nodes="${elasticsearch.url}" />
<bean name="elasticsearchTemplate" class="org.springframework.data.elasticsearch.core.ElasticsearchTemplate">
  <constructor-arg name="client" ref="client"/>
</bean>	
<elasticsearch:repositories base-package="cz.zcu.kiv.eegdatabase.data.nosql.repositories" />

How are elasticsearch and hibernate entities connected

todo (interceptor, Experiment+ExperimentElastic classes...)

## Database setup

  • Development database now runs on eeg2.kiv.zcu.cz with HEAD plugin installed (https://github.com/mobz/elasticsearch-head)
  • One non-replicated node runs on this server with standard five shards. There is (obviously) custom mapping defined like:
POST http://eeg2.kiv.zcu.cz:9200/eegportal
{
  "settings": {
    "index": {
      "number_of_shards": 5,
      "number_of_replicas": 0
    },
    "analysis": {
      "analyzer": {
        "experiment_param_value_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "stop",
            "asciifolding",
            "edge_ngram"
          ]
        }
      },
      "filter": {
        "edge_ngram": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 20
        }
      }
    }
  }
}

and then mapping for experiment type

PUT http://eeg2.kiv.zcu.cz:9200/eegportal/experiment/_mapping
{
  "experiment": {
    "properties": {
      "experimentId": {
        "type": "string"
      },
      "groupId": {
        "type": "integer"
      },
      "params": {
        "type": "nested",
        "properties": {
          "attributes": {
            "type": "nested",
            "properties": {
              "name": {
                "type": "string"
              },
              "value": {
                "type": "string",
                "index_analyzer": "experiment_param_value_analyzer"
              }
            }
          },
          "name": {
            "type": "string"
          },
          "valueInteger": {
            "type": "integer"
          },
          "valueString": {
            "type": "string",
            "index_analyzer": "experiment_param_value_analyzer"
          }
        }
      }
    }
  }
}

Those mappings and custom analyzers MUST BE SET BEFORE anything is inserted into ES (into specific index). Changes to index mappings or adding custom analyzers on a running cluster is a problem and usually all data has to be reindexed (= DELETED and application must insert them somehow again). Elasticsearch doesn't handle this internally. It is one of the most annoying things, but it is as it is, just get over it :)

Elasticsearch migration (synchronization)

Clone this wiki locally