@@ -16,9 +16,23 @@ Elasticsearch 5.6.x (configured in `application.conf`)
1616
1717### Build
1818
19- Get the code, change into the project directory, and run the tests :
19+ Get the code, change into the project directory:
2020
21- ` git clone https://github.com/hbz/lobid-gnd.git ; cd lobid-gnd ; sbt test `
21+ ` git clone https://github.com/hbz/lobid-gnd.git ; cd lobid-gnd `
22+
23+ If needed start elasticsearch:
24+
25+ ``` bash
26+ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.3.tar.gz
27+ tar -xzf elasticsearch-5.6.3.tar.gz
28+ ./elasticsearch-5.6.3/bin/elasticsearch &
29+ ```
30+
31+ (If elastic search is already running and you want to start a new es: ` ps -ef | grep elas ` and ` kill [PID] ` .)
32+
33+ Run tests:
34+
35+ ` sbt test `
2236
2337### Data
2438
@@ -48,7 +62,7 @@ Index the data, passing the index name:
4862
4963` sbt -Dindex.entityfacts.index=entityfacts_20210120 "runMain apps.Index entityfacts" `
5064
51- For configuration details and defaults, see ' conf/application.conf' .
65+ For configuration details and defaults, see ` conf/application.conf ` .
5266
5367#### GND Baseline
5468
@@ -58,13 +72,13 @@ Set up a location for the input data:
5872
5973` mkdir input_data; cd input_data `
6074
61- Set ' data.rdfxml' in ' conf/application.conf' to the ' input_data' location.
75+ Set ` data.rdfxml ` in ` conf/application.conf ` to the ` input_data ` location.
6276
6377Get the GND RDF/XML source data from < https://data.dnb.de/opendata/ > :
6478
65- ` wget https://data.dnb.de/opendata/authorities-{geografikum,koerperschaft,kongress,person,sachbegriff,werk}_lds.rdf.gz `
79+ ` wget https://data.dnb.de/opendata/authorities-gnd- {geografikum,koerperschaft,kongress,person,sachbegriff,werk}_lds.rdf.gz `
6680
67- This should give you 6 local files ending with ' .rdf.gz' . Go back to the project root directory:
81+ This should give you 6 local files ending with ` .rdf.gz ` . Go back to the project root directory:
6882
6983` cd .. `
7084
@@ -74,11 +88,11 @@ Set up a location for the index data:
7488
7589` mkdir index_data `
7690
77- Set ' data.jsonlines' in ' conf/application.conf' to the ' index_data' location.
91+ Set ` data.jsonlines ` in ` conf/application.conf ` to the ` index_data ` location.
7892
79- Set ' index.boot' in ' conf/application.conf' to an existing index. This index will be used to get labels during the conversion process.
93+ Set ` index.boot ` in ` conf/application.conf ` to an existing index. This index will be used to get labels during the conversion process.
8094
81- Set ' index.prod' in ' conf/application.conf' to a non-existing index. This index name will be used in the indexing data created during conversion.
95+ Set ` index.prod ` in ` conf/application.conf ` to a non-existing index. This index name will be used in the indexing data created during conversion.
8296
8397Convert the data to JSON-LD lines, the index data format:
8498
@@ -88,11 +102,11 @@ To be able to log out from the server while the conversion is running, we actual
88102
89103` setsid nohup sbt "runMain apps.ConvertBaseline" & `
90104
91- This should create 6 ' \* .jsonl' files in ' index_data' .
105+ This should create 6 ` \*.jsonl ` files in ` index_data ` .
92106
93107##### Index the JSON data
94108
95- If the ' index.prod' configured in ' application.conf' does not exists, a new index will be created.
109+ If the ` index.prod ` configured in ` application.conf ` does not exists, a new index will be created.
96110
97111To start the indexing, run:
98112
@@ -104,11 +118,11 @@ To start the indexing, run:
104118
105119Updates are pulled via [ the DNB OAI-PMH interface] ( https://www.dnb.de/DE/Professionell/Metadatendienste/Datenbezug/OAI/oai_node.html ) .
106120
107- Pass one or two arguments: get updates since (and optionally until) a given date:
121+ Pass one or two arguments: get updates since (and optionally until) a given datetime ` [YYYY-MM-DD]T[HH:MM:SS]Z ` (not just date) :
108122
109- ` sbt "runMain apps.ConvertUpdates 2022-06-22 2022-06-23 " `
123+ ` sbt "runMain apps.ConvertUpdates 2022-06-22T11:08:23Z 2022-06-23T18:08:23Z " `
110124
111- The date of the most recent update is stored in ' GND-lastSuccessfulUpdate.txt' (can be changed in the config).
125+ The date of the most recent update is stored in ` GND-lastSuccessfulUpdate.txt ` (can be changed in the config).
112126
113127The original downloaded data and the converted data are stored in separate files. To convert the data again without downloading it, use the steps described above under 'Convert RDF/XML to JSON' with the update RDF data.
114128
@@ -118,11 +132,11 @@ To index the updates run:
118132
119133` sbt "runMain apps.Index updates" `
120134
121- See ' application.conf' for details on the configured file names etc.
135+ See ` application.conf ` for details on the configured file names etc.
122136
123137### Web
124138
125- In ' lobid-gnd' , run the web application:
139+ In ` lobid-gnd ` , run the web application:
126140
127141` sbt run `
128142
0 commit comments