This cookbook provides practical command-line examples for running Datashare in common configurations.
Assumptions (unless stated otherwise):
- User:
dev - Commands run from the Datashare repository root
- Example projects:
banana-papers,citrus-confidential,local-datashare - OAuth server available at
http://oauth:3001
The
run.shscript is a development wrapper that launches Datashare with Java debug options enabled (JDWP on port 8090 by default). It automatically detects the project version and locates the distribution JAR. For production deployments, use thedatasharebinary directly.
Show all available options and their default values:
./run.sh --help
CLI mode is used for long-running operations such as scanning, indexing, and NLP processing.
Run a full scan and index on a project. Typically used for initial ingestion or reindexing after configuration changes.
./run.sh \
--mode CLI \
--dataDir /home/dev/Datashare/Data/ \
--defaultProject banana-papers \
--stages "SCAN,INDEX"
Retrieve a report for a given queue. Useful for inspecting or debugging queue processing.
./run.sh \
--mode CLI \
--dataDir /home/dev/Datashare/Data/ \
--defaultProject cantina \
--stages "SCANIDX" \
--reportName <QueueName>
Install a Datashare plugin into the plugins directory:
./run.sh \
--mode CLI \
--pluginInstall datashare-plugin-tour \
--pluginsDir /home/dev/Datashare/Plugins
Run NLP processing using CoreNLP. Adjust parallelism settings based on available CPU and memory.
./run.sh \
--mode CLI \
--stages NLP \
--nlpp CORENLP \
--nlpParallelism 2 \
--parallelism 2 \
--parserParallelism 2 \
--dataDir /vault/citrus-confidential \
--defaultProject citrus-confidential
Start Datashare in LOCAL mode (the default) with minimal configuration. This creates a single-user instance.
./run.sh \
--dataDir /home/dev/Datashare/Data/
Server mode is designed for multi-user deployments with authentication.
Standard server mode using OAuth (e.g., Keycloak). This is the recommended setup for multi-user deployments.
Server mode with plugins:
./run.sh \
--mode SERVER \
--dataDir /home/dev/Datashare/Data/ \
--oauthClientId datashareuidforseed \
--oauthClientSecret datasharesecretforseed \
--pluginDir /home/dev/Datashare/Plugins
Server mode with OAuth and PostgreSQL:
./run.sh \
--mode SERVER \
--dataDir /home/dev/Datashare/Data/ \
--dataSourceUrl "jdbc:postgresql://postgres/datashare?user=dstest&password=test" \
--oauthClientId datashareuidforseed \
--oauthClientSecret datasharesecretforseed
Full OAuth configuration with local OAuth server and SQLite:
./run.sh \
--mode SERVER \
--dataDir /home/dev/Datashare/Data/ \
--defaultProject local-datashare \
--dataSourceUrl "jdbc:sqlite:file:$HOME/datashare.db" \
--cors "*" \
--oauthClientId datashareuidforseed \
--oauthClientSecret datasharesecretforseed \
--oauthAuthorizeUrl http://oauth:3001/oauth/authorize \
--oauthTokenUrl http://oauth:3001/oauth/token \
--oauthApiUrl http://oauth:3001/api/v1/me.json \
--oauthCallbackPath /auth/callback \
--busType MEMORY \
--queueType MEMORY \
--sessionStoreType MEMORY
Simplified OAuth example with explicit endpoints:
./run.sh \
--mode SERVER \
--dataDir /home/dev/Datashare/Data/ \
--defaultProject local-datashare \
--oauthClientId datashareuidforseed \
--oauthClientSecret datasharesecretforseed \
--oauthAuthorizeUrl http://oauth:3001/oauth/authorize \
--oauthTokenUrl http://oauth:3001/oauth/token \
--oauthApiUrl http://oauth:3001/api/v1/me.json
Basic authentication is intended for testing or constrained environments.
Basic Auth with Redis:
See: Basic with Redis documentation
./run.sh \
--mode SERVER \
--authFilter org.icij.datashare.session.BasicAuthAdaptorFilter \
--redisAddress redis://redis:6379
Basic Auth with PostgreSQL:
See: Basic with Database documentation
./run.sh \
--mode SERVER \
--authFilter org.icij.datashare.session.BasicAuthAdaptorFilter \
--authUsersProvider org.icij.datashare.session.UsersInDb \
--dataSourceUrl "jdbc:postgresql://postgres/datashare?user=dstest&password=test"
Embedded mode targets low-resource environments (e.g., Raspberry Pi). All services run on the same host.
Warning: This configuration has not been tested since 2020.
./run.sh \
--mode EMBEDDED \
--dataDir /home/pi/data \
--dataSourceUrl jdbc:sqlite:/home/pi/dist/datashare.sqlite \
--elasticsearchAddress http://localhost:9200 \
--elasticsearchDataPath /home/pi/es \
--redisAddress redis://localhost:6379 \
--messageBusAddress localhost \
--tcpListenPort 80
Batch Search runs as a dedicated daemon consuming extraction queues. It is typically started alongside a standard server instance.
Server process:
./run.sh \
--mode SERVER \
--dataDir /home/dev/Datashare/Data/ \
--dataSourceUrl "jdbc:postgresql://postgres/datashare?user=dstest&password=test" \
--oauthClientId datashareuidforseed \
--oauthClientSecret datasharesecretforseed
Batch search daemon (separate terminal):
JDWP_TRANSPORT_PORT=8001 ./run.sh \
--mode BATCH_SEARCH \
--dataDir /home/dev/Datashare/Data/ \
--dataSourceUrl "jdbc:postgresql://postgres/datashare?user=dstest&password=test" \
--batchQueueType org.icij.datashare.extract.RedisBlockingQueue
Caution: Maintenance commands are destructive and should be used with care.
Delete the Elasticsearch index for a project:
curl -X DELETE http://localhost:9200/citrus-confidential