-
Notifications
You must be signed in to change notification settings - Fork 66
Description
After using the official docker-compose.yml from the documentation and starting a clean install, I am able to use the GUI to install the OPENNLP Pipeline and Neo4J extensions, along with the Neo4J Graph Viewer plugin. I get the message in the GUI about restarting Datashare, so I do a docker compose down, but when I restart the stack, Datashare fails to start with the following error message:
datashare-1 |
datashare-1 | 2026-01-26 20:26:33,050 [main] ERROR DatashareCli - Failed to parse arguments.
datashare-1 | java.lang.NullPointerException: null
datashare-1 | at java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
datashare-1 | at java.base/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
datashare-1 | at java.base/java.util.Properties.put(Properties.java:1301)
datashare-1 | at java.base/java.util.Properties.setProperty(Properties.java:229)
datashare-1 | at org.icij.datashare.cli.DatashareCli.parseArguments(DatashareCli.java:76)
datashare-1 | at org.icij.datashare.Main.main(Main.java:16)
datashare-1 | Usage:
datashare-1 | Option Description
datashare-1 | ------ -----------
datashare-1 | -?, -h, --help
datashare-1 | --apiKey <String> existing api key for user
datashare-1 | --artifactDir <String> Artifact directory for embedded
datashare-1 | caching. If not provided datashare
datashare-1 | will use memory.
datashare-1 | --authFilter <String> Server mode auth filter class
datashare-1 | --authUsersProvider <String> Server mode auth users provider class
datashare-1 | --batchDownloadDir <String> Directory where Batch Download
datashare-1 | archives are downloaded. (default:
datashare-1 | /home/datashare/.
datashare-1 | local/share/datashare/tmp)
datashare-1 | --batchDownloadEncrypt <Boolean> Whether Batch download zip files are
datashare-1 | encrypted or not. SmtpUrl should be
datashare-1 | set to send the password. (default
datashare-1 | false)
datashare-1 | --batchDownloadMaxNbFiles <Integer> Maximum file number that can be
datashare-1 | archived in a zip (Default 10,000)
datashare-1 | (default: 10000)
datashare-1 | --batchDownloadMaxSize <[0-9]+[KMG]?> Maximum total files size that can be
datashare-1 | zipped. Human readable suffix K/M/G
datashare-1 | for KB/MB/GB (Default 100M)
datashare-1 | (default: 100M)
datashare-1 | --batchDownloadScroll <String> Scroll duration used for elasticsearch
datashare-1 | scrolls (Batch Download) (default:
datashare-1 | 60000ms)
datashare-1 | --batchDownloadScrollSize <Integer> Scroll size used for elasticsearch
datashare-1 | scrolls (Batch Download) (default:
datashare-1 | 1000)
datashare-1 | --batchDownloadTimeToLive <Integer> Time to live in hour for batch
datashare-1 | download zip files (Default 24)
datashare-1 | (default: 24)
datashare-1 | --batchQueueType <QueueType> (default: MEMORY)
datashare-1 | --batchSearchMaxTimeSeconds <Integer> Max time for batch search in seconds
datashare-1 | --batchSearchScroll <String> Scroll duration used for elasticsearch
datashare-1 | scrolls (Batch Search) (default:
datashare-1 | 60000ms)
datashare-1 | --batchSearchScrollSize <Integer> Scroll size used for elasticsearch
datashare-1 | scrolls (Batch Search) (default:
datashare-1 | 1000)
datashare-1 | --batchSize <Integer> Batch size of NLP extraction task in
datashare-1 | number of documents. (default: 1024)
datashare-1 | --batchThrottleMilliseconds <Integer> Throttle for batch in milliseconds
datashare-1 | --browserOpenLink <Boolean> try to open link in the default
datashare-1 | browser (default: false)
datashare-1 | --busType <QueueType> Backend data bus type. (default:
datashare-1 | MEMORY)
datashare-1 | --charset <String> Datashare default charset. Example:
datashare-1 | [UTF-8, ISO-8859-1] (default: UTF-8)
datashare-1 | --clusterName <String> Cluster name (default: datashare)
datashare-1 | --cors <String> CORS headers (needs the web option)
datashare-1 | (default: no-cors)
datashare-1 | --createIndex <String> creates an index with the given name
datashare-1 | -d, --dataDir <File> Document source files directory
datashare-1 | (default: /home/datashare/Datashare)
datashare-1 | --dataSourceUrl <String> Datasource URL. For using memory you
datashare-1 | can use 'jdbc:sqlite:file:memorydb.
datashare-1 | db?mode=memory&cache=shared'
datashare-1 | (default: jdbc:sqlite:file:
datashare-1 | /home/datashare/.
datashare-1 | local/share/datashare/dist/datashare.
datashare-1 | db)
datashare-1 | --deleteApiKey <String> Delete api key for user
datashare-1 | --digestAlgorithm <SHA-[1|256|384|512] (default: SHA-384)
datashare-1 | or MD5>
datashare-1 | --digestProjectName <String> Includes the project name in the hash
datashare-1 | of documents when indexing. It is
datashare-1 | set by default to the defaultProject
datashare-1 | value. See noDigestProject option to
datashare-1 | disable it.
datashare-1 | --elasticsearchAddress <String> Elasticsearch host address (default:
datashare-1 | http://elasticsearch:9200)
datashare-1 | --elasticsearchDataPath <String> Data path used for embedded
datashare-1 | Elasticsearch (default:
datashare-1 | /home/datashare/.
datashare-1 | local/share/datashare/es)
datashare-1 | --embeddedDocumentDownloadMaxSize <[0- Maximum download size of embedded
datashare-1 | 9]+[KMG]?> documents. Human readable suffix
datashare-1 | K/M/G for KB/MB/GB (Default 1G)
datashare-1 | (default: 1G)
datashare-1 | --ext <String> Run CLI extension
datashare-1 | --extensionDelete <String> Delete extension with its id or base
datashare-1 | directory (needs extensionsDir
datashare-1 | option)
datashare-1 | --extensionInstall <String> Install extension with either id or
datashare-1 | URL or file path (needs
datashare-1 | extensionsDir option)
datashare-1 | --extensionList [String] Extensions list matching provided
datashare-1 | string
datashare-1 | --extensionsDir <String> Extensions directory (backend)
datashare-1 | (default: /home/datashare/.
datashare-1 | local/share/datashare/extensions)
datashare-1 | --followSymlinks <Boolean> Follow symlinks while scanning
datashare-1 | documents (default: true)
datashare-1 | --full-import Performs a full import, importing all
datashare-1 | available documents and named
datashare-1 | entities from Datashare to neo4j
datashare-1 | --grantAdmin <String> Grant admin policy to user if there is
datashare-1 | none
datashare-1 | --indexTimeout <positive integer> Time to wait in minutes before
datashare-1 | consumer termination during document
datashare-1 | indexing (Default 30m) (default: 30)
datashare-1 | -k, --createApiKey <String> Generate and store api key for user
datashare-1 | defaultUser (see opt)
datashare-1 | -l, --language <String> Explicitly specify language of indexed
datashare-1 | documents (instead of detecting
datashare-1 | automatically)
datashare-1 | --logLevel <String> Sets the log level of Datashare
datashare-1 | ([ERROR, WARN, INFO, DEBUG, TRACE])
datashare-1 | (default: INFO)
datashare-1 | -m, --mode <Mode> Datashare run mode [LOCAL, SERVER,
datashare-1 | CLI, NER, TASK_WORKER, EMBEDDED]
datashare-1 | (default: LOCAL)
datashare-1 | --maxContentLength <[0-9]+[KMG]?> Maximum length (in bytes) of extracted
datashare-1 | text that could be indexed (-1 means
datashare-1 | no limit and value should be less or
datashare-1 | equal than 2G). Human readable
datashare-1 | suffix K/M/G for KB/MB/GB (Default
datashare-1 | 20M) (default: 20000000)
datashare-1 | --messageBusAddress <String> Message bus address (default: redis:
datashare-1 | //redis:6379)
datashare-1 | --neo4jAppLogInJson <Boolean> Should the Python process log in JSON
datashare-1 | format (default: false)
datashare-1 | --neo4jAppMaxDumpedDocuments <Long> Maximum number for document nodes
datashare-1 | allowed during export on SERVER mode
datashare-1 | (default: 10000)
datashare-1 | --neo4jAppPort <Integer> Python neo4j service port (default:
datashare-1 | 8008)
datashare-1 | --neo4jAppStartTimeoutS <Integer> Python neo4j service start timeout.
datashare-1 | (default: 30)
datashare-1 | --neo4jCliTaskPollIntervalS <Integer> Interval in second used to poll task
datashare-1 | statuses when in CLI mode (default:
datashare-1 | 2)
datashare-1 | --neo4jHost <String> Hostname of the neo4j DB. (default:
datashare-1 | 127.0.0.1)
datashare-1 | --neo4jPassword <String> Password used to connect to the neo4j
datashare-1 | DB (default: please-change-this-
datashare-1 | password)
datashare-1 | --neo4jPort <Integer> Port of the neo4j DB. (default: 7687)
datashare-1 | --neo4jProcessInheritOutputs <Boolean> Should the Python process outputs be
datashare-1 | redirected to the Java process
datashare-1 | outputs ? (default: true)
datashare-1 | --neo4jSingleProject <String> Name of the single project which will
datashare-1 | be able to user the extension when
datashare-1 | using neo4j Community Edition
datashare-1 | (default: local-datashare)
datashare-1 | --neo4jUriScheme <String> URI scheme used to connect to the
datashare-1 | neo4j DB (can be: bolt, neo4j,
datashare-1 | bolt+s, neo4j+s, ....) (default:
datashare-1 | bolt)
datashare-1 | --neo4jUser <String> User name used to connect to the neo4j
datashare-1 | DB (default: neo4j)
datashare-1 | --nlpParallelism, --np <Integer> Number of NLP extraction threads per
datashare-1 | pipeline. (default: 1)
datashare-1 | --nlpPipeline, --nlpp <String> NLP pipeline to be run. (default:
datashare-1 | CORENLP)
datashare-1 | --noDigestProject <Boolean> Disable the project name in document
datashare-1 | hash processing (only using binary
datashare-1 | contents). (default: false)
datashare-1 | -o, --ocr <Boolean> Run optical character recognition at
datashare-1 | file parsing time. (Tesseract must
datashare-1 | be installed beforehand). (default:
datashare-1 | true)
datashare-1 | --oauthApiUrl <String> OAuth2 api url
datashare-1 | --oauthAuthorizeUrl <String> OAuth2 authorize url
datashare-1 | --oauthCallbackPath <String> OAuth2 callback path (in datashare)
datashare-1 | --oauthClaimIdAttribute <String> Json field name sent by the Identity
datashare-1 | Provider that contains user
datashare-1 | identifier value.
datashare-1 | --oauthClientId <String> OAuth2 client id
datashare-1 | --oauthClientSecret <String> OAuth2 client secret key
datashare-1 | --oauthDefaultProject <String> Default project to use for Oauth2 users
datashare-1 | --oauthScope <String> Set scope in oauth2 callback url,
datashare-1 | needed for OIDC providers
datashare-1 | --oauthTokenUrl <String> OAuth2 token url
datashare-1 | --oauthUserProjectsAttribute <String> Json field name sent by the Identity
datashare-1 | Provider that contains user
datashare-1 | projects. (default:
datashare-1 | groups_by_applications.datashare)
datashare-1 | --ocrLanguage <String> Explicitly specify OCR languages for
datashare-1 | tesseract. 3-character ISO 639-2
datashare-1 | language codes and + sign for
datashare-1 | multiple languages
datashare-1 | --ocrType <String> OCR implementation: TESSERACT or
datashare-1 | TESS4J (default: TESSERACT)
datashare-1 | -p, --project <String> Name of the datashare project
datashare-1 | --parallelism <Integer> Number of threads allocated for task
datashare-1 | management. (default: 16)
datashare-1 | --parserParallelism, --pp <Integer> Number of file parser threads.
datashare-1 | (default: 1)
datashare-1 | --pluginDelete <String> Delete plugin with its id or base
datashare-1 | directory (needs pluginsDir option)
datashare-1 | --pluginInstall <String> Install plugin with either id or URL
datashare-1 | or file path (needs pluginsDir
datashare-1 | option)
datashare-1 | --pluginList [String] Plugins list matching provided string
datashare-1 | --pluginsDir <String> Plugins directory (default:
datashare-1 | /home/datashare/.
datashare-1 | local/share/datashare/plugins)
datashare-1 | --pollingInterval <String> Queue polling interval. (default: 60)
datashare-1 | --port, --tcpListenPort <Integer> Port used by the HTTP server (default:
datashare-1 | 8080)
datashare-1 | --protectedUriPrefix <String> Protected URI prefix (default: /api/)
datashare-1 | --queueCapacity <positive integer> Queue capacity is the size of the
datashare-1 | internal file path buffer used by
datashare-1 | the queue. (default: 1000000)
datashare-1 | --queueName <String> Extract queue name (default: extract:
datashare-1 | queue)
datashare-1 | --queueType <QueueType> Backend queues and sets type.
datashare-1 | (default: MEMORY)
datashare-1 | -r, --resume Resume pending operations
datashare-1 | --redisAddress <String> Redis queue address (default: redis:
datashare-1 | //redis:6379)
datashare-1 | --redisPoolSize <Integer> Pool size for main Redis client
datashare-1 | (default: 5)
datashare-1 | --reportName <String> name of the map for the report map
datashare-1 | (where index results are stored). No
datashare-1 | report records are saved if not
datashare-1 | provided
datashare-1 | --rootHost <String> Datashare host for urls
datashare-1 | -s, --settings <String> Property settings file
datashare-1 exited with code 1
I do have the /home/datashare/.local/share/datashare/extensions and /home/datashare/.local/share/datashare/plugins directories mapped to a volume in my docker compose file, so I would expect the Datashare container to come back up cleanly.
Here are the relevant sections from my docker-compose.yml:
services:
datashare:
image: ${DATASHARE_IMAGE}
hostname: datashare
ports:
- 8080:8080
environment:
- DS_DOCKER_MOUNTED_DATA_DIR=/home/datashare/data
volumes:
- ${DATASHARE_DATA_DIR}:/home/datashare/data
- datashare-models:/home/datashare/dist
- datashare-extensions:/home/datashare/.local/share/datashare/extensions
- datashare-plugins:/home/datashare/.local/share/datashare/plugins
command: >-
--dataSourceUrl jdbc:postgresql://postgresql/datashare?user=datashare\&password=password
--mode LOCAL
--tcpListenPort 8080
depends_on:
- postgresql
- redis
- elasticsearch
...
volumes:
datashare-models:
datashare-extensions:
datashare-plugins:
elasticsearch-data:
postgresql-data:
neo4j_data:
neo4j_conf:
System specs:
- Host: Ubuntu 24
- Datashare version: 20.8.2
Expected behavior
I would expect the Datashare container to be able to be cleanly restarted.