Skip to content

Commit b65cf07

Browse files
committed
Merge remote-tracking branch 'origin/improveReadme' into 455-updateJavaVersionTo11
2 parents dd62950 + ddf57a0 commit b65cf07

File tree

2 files changed

+31
-16
lines changed

2 files changed

+31
-16
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,4 @@ bin
1313
*log*.gz
1414
*.tmp
1515
GND-updates*
16+
/elasticsearch-*

README.md

Lines changed: 30 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,23 @@ Elasticsearch 5.6.x (configured in `application.conf`)
1616

1717
### Build
1818

19-
Get the code, change into the project directory, and run the tests:
19+
Get the code, change into the project directory:
2020

21-
`git clone https://github.com/hbz/lobid-gnd.git ; cd lobid-gnd ; sbt test`
21+
`git clone https://github.com/hbz/lobid-gnd.git ; cd lobid-gnd`
22+
23+
If needed start elasticsearch:
24+
25+
```bash
26+
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.3.tar.gz
27+
tar -xzf elasticsearch-5.6.3.tar.gz
28+
./elasticsearch-5.6.3/bin/elasticsearch &
29+
```
30+
31+
(If elastic search is already running and you want to start a new es: `ps -ef | grep elas` and `kill [PID]`.)
32+
33+
Run tests:
34+
35+
`sbt test`
2236

2337
### Data
2438

@@ -48,7 +62,7 @@ Index the data, passing the index name:
4862

4963
`sbt -Dindex.entityfacts.index=entityfacts_20210120 "runMain apps.Index entityfacts"`
5064

51-
For configuration details and defaults, see 'conf/application.conf'.
65+
For configuration details and defaults, see `conf/application.conf`.
5266

5367
#### GND Baseline
5468

@@ -58,13 +72,13 @@ Set up a location for the input data:
5872

5973
`mkdir input_data; cd input_data`
6074

61-
Set 'data.rdfxml' in 'conf/application.conf' to the 'input_data' location.
75+
Set `data.rdfxml` in `conf/application.conf` to the `input_data` location.
6276

6377
Get the GND RDF/XML source data from <https://data.dnb.de/opendata/>:
6478

65-
`wget https://data.dnb.de/opendata/authorities-{geografikum,koerperschaft,kongress,person,sachbegriff,werk}_lds.rdf.gz`
79+
`wget https://data.dnb.de/opendata/authorities-gnd-{geografikum,koerperschaft,kongress,person,sachbegriff,werk}_lds.rdf.gz`
6680

67-
This should give you 6 local files ending with '.rdf.gz'. Go back to the project root directory:
81+
This should give you 6 local files ending with `.rdf.gz`. Go back to the project root directory:
6882

6983
`cd ..`
7084

@@ -74,11 +88,11 @@ Set up a location for the index data:
7488

7589
`mkdir index_data`
7690

77-
Set 'data.jsonlines' in 'conf/application.conf' to the 'index_data' location.
91+
Set `data.jsonlines` in `conf/application.conf` to the `index_data` location.
7892

79-
Set 'index.boot' in 'conf/application.conf' to an existing index. This index will be used to get labels during the conversion process.
93+
Set `index.boot` in `conf/application.conf` to an existing index. This index will be used to get labels during the conversion process.
8094

81-
Set 'index.prod' in 'conf/application.conf' to a non-existing index. This index name will be used in the indexing data created during conversion.
95+
Set `index.prod` in `conf/application.conf` to a non-existing index. This index name will be used in the indexing data created during conversion.
8296

8397
Convert the data to JSON-LD lines, the index data format:
8498

@@ -88,11 +102,11 @@ To be able to log out from the server while the conversion is running, we actual
88102

89103
`setsid nohup sbt "runMain apps.ConvertBaseline" &`
90104

91-
This should create 6 '\*.jsonl' files in 'index_data'.
105+
This should create 6 `\*.jsonl` files in `index_data`.
92106

93107
##### Index the JSON data
94108

95-
If the 'index.prod' configured in 'application.conf' does not exists, a new index will be created.
109+
If the `index.prod` configured in `application.conf` does not exists, a new index will be created.
96110

97111
To start the indexing, run:
98112

@@ -104,11 +118,11 @@ To start the indexing, run:
104118

105119
Updates are pulled via [the DNB OAI-PMH interface](https://www.dnb.de/DE/Professionell/Metadatendienste/Datenbezug/OAI/oai_node.html).
106120

107-
Pass one or two arguments: get updates since (and optionally until) a given date:
121+
Pass one or two arguments: get updates since (and optionally until) a given datetime `[YYYY-MM-DD]T[HH:MM:SS]Z` (not just date):
108122

109-
`sbt "runMain apps.ConvertUpdates 2022-06-22 2022-06-23"`
123+
`sbt "runMain apps.ConvertUpdates 2022-06-22T11:08:23Z 2022-06-23T18:08:23Z"`
110124

111-
The date of the most recent update is stored in 'GND-lastSuccessfulUpdate.txt' (can be changed in the config).
125+
The date of the most recent update is stored in `GND-lastSuccessfulUpdate.txt` (can be changed in the config).
112126

113127
The original downloaded data and the converted data are stored in separate files. To convert the data again without downloading it, use the steps described above under 'Convert RDF/XML to JSON' with the update RDF data.
114128

@@ -118,11 +132,11 @@ To index the updates run:
118132

119133
`sbt "runMain apps.Index updates"`
120134

121-
See 'application.conf' for details on the configured file names etc.
135+
See `application.conf` for details on the configured file names etc.
122136

123137
### Web
124138

125-
In 'lobid-gnd', run the web application:
139+
In `lobid-gnd`, run the web application:
126140

127141
`sbt run`
128142

0 commit comments

Comments
 (0)