-
Notifications
You must be signed in to change notification settings - Fork 24
Elasticsearch
- Detailed insight on how the Elasticsearch database is used in eegportal
-
Accomplished by https://github.com/spring-projects/spring-data-elasticsearch.
-
Now running on elasticsearch v0.90.9. Hopefully, before my job is done, there will be an elasticsearch 1.0-Final release and we will have database + driver running on this version. Right now, there is a maven dependency 1.0-SNAPSHOT (for spring-data-elasticsearch) which gets updated daily and updates can break things up. In case they do not publish a stable maven release before end of my work, the jar will be placed to a local repo so we can freeze the driver version.
-
Connection strings are stored in WEB-INF/project.properties
* Beans related to elasticsearch are defined in WEB-INF/persistence.xml
<elasticsearch:transport-client id="client" cluster-name="${elasticsearch.clusterName}" cluster-nodes="${elasticsearch.url}" />
<bean name="elasticsearchTemplate" class="org.springframework.data.elasticsearch.core.ElasticsearchTemplate">
<constructor-arg name="client" ref="client"/>
</bean>
<elasticsearch:repositories base-package="cz.zcu.kiv.eegdatabase.data.nosql.repositories" />todo (interceptor, Experiment+ExperimentElastic classes...)
## Database setup
- Development database now runs on eeg2.kiv.zcu.cz with HEAD plugin installed (https://github.com/mobz/elasticsearch-head)
- One non-replicated node runs on this server with standard five shards. There is (obviously) custom mapping defined like:
POST http://eeg2.kiv.zcu.cz:9200/eegportal
{
"settings": {
"index": {
"number_of_shards": 5,
"number_of_replicas": 0
},
"analysis": {
"analyzer": {
"experiment_param_value_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"stop",
"asciifolding",
"edge_ngram"
]
}
},
"filter": {
"edge_ngram": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
}
}
}
}and then mapping for experiment type
PUT http://eeg2.kiv.zcu.cz:9200/eegportal/experiment/_mapping
{
"experiment": {
"properties": {
"experimentId": {
"type": "string"
},
"groupId": {
"type": "integer"
},
"params": {
"type": "nested",
"properties": {
"attributes": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"value": {
"type": "string",
"index_analyzer": "experiment_param_value_analyzer"
}
}
},
"name": {
"type": "string"
},
"valueInteger": {
"type": "integer"
},
"valueString": {
"type": "string",
"index_analyzer": "experiment_param_value_analyzer"
}
}
}
}
}
}Those mappings and custom analyzers MUST BE SET BEFORE anything is inserted into ES (into specific index). Changes to index mappings or adding custom analyzers on a running cluster is a problem and usually all data has to be reindexed (= DELETED and application must insert them somehow again). Elasticsearch doesn't handle this internally. It is one of the most annoying things, but it is as it is, just get over it :)
I've heard there are some teams working on some kind of offline sync/backup/whatever. This might come handy:
-
http://www.elasticsearch.org/blog/introducing-snapshot-restore/
-
I have personally tested the Elasticsearch-Exporter (a nodejs module that dumps whole index into one file that can be re-imported on another machine) - useful during changes in mapping and reindexing whole data.