-
Notifications
You must be signed in to change notification settings - Fork 8
Instructions to install and run PDI Docker
This is the Dockerized version of the insights portion of the Polar Deep Insights system. Two parts of the project, the insight-generator, a python library used to extract information, and the insight-visualizer, a javascript application used for data visualization, can be installed and run in Docker containers using the instructions below.
- Install docker - if it isn't already installed and make sure that it is running.
- If you normally log into a docker registry on your machine, do so now
- At a terminal window type
git clone https://github.com/USCDataScience/polar-deep-insights.git - Then type
cd polar-deep-insights/Docker - Install npm if it isn't installed already, else skip this step.
- Install elastic search tools. Depending on your permissions you may have to type
-
npm install -g elasticsearch-toolsor -
sudo npm install -g elasticsearch-toolsand entering your password at the prompt
-
- Make the sript setup.sh executable by typing
chmod +x setup.sh- It should be noted that
./setup.shcreates a data folder and populates it with a variety of other required files and empty folders - If you are planning on analyzing your own files, please put them in the in the
data/filesfolder. Any format is acceptable, though the parsers may not extract all possible data if the format is very unusual.
- It should be noted that
- Export elastic index mappings.
-
If using polar.usc.edu's elastic search data, type
es-export-mappings --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data-mappings.json -
If using your own database - Replace the
http://polar.usc.edu/elasticsearchin the above command with your remote elastic search url or your localhost elastic index's url and run the above command.
-
- Export elastic index data.
-
If using polar.usc.edu's elastic search data type
es-export-bulk --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data.json -
If using your own database - Replace the
http://polar.usc.edu/elasticsearchin the above command with your remote elastic search url or your localhost elastic search url and run the above command.
-
NOTE: This last step may take a while depending on the size of your elasticsearch database. The Polar data set contains 100k documents and takes quite a long time (go get coffee).
-
Install the necessary scripts and python files that generate insights
- On Linux based OS (Ubuntu, MacOS, etc):
chmod +x pre_installation.sh-
./pre_installation.shThis step will install the necessary sh files from the web and uses the wget command. If you encounter the error: wget not found:-- either install wget OR
- Open the ~/polar-deep-insights/Docker/pre_installation.sh file and replace
wgetwithcurl -0on each command OR - Manually download the files from their source web pages as mentioned in the pre_installation.sh script.
- For Windows OS:
- If you have wget for windows as mentioned here, replace wget in the
pre_installation.shfile with wget for windows. - A more hassle-free solution is to manually download the files from their source web pages as mentioned in the pre_installation.sh script.
- If you have wget for windows as mentioned here, replace wget in the
- On Linux based OS (Ubuntu, MacOS, etc):
-
Add files to the following folders according to these instructions:
-
data/files: Add your data files of any filetype - to generate insights from -
data/polar: Contains mappings and data from the elastic search url -
data/ingest: Output from pdi insight generator will be saved here under the filenameingest_data.json -
data/sparkler/raw: Add Sparkler crawled data from the SOLR index into thesparkler_rawdata.jsonfile in this folder -
data/sparkler/parsed: Sparkler data (indata/sparkler/raw/sparkler_rawdata.json) is parsed using parse.py and saved insparkler_data.json
-
-
Build Insight Generator
-
git clone https://github.com/USCDataScience/polar-deep-insights.git && cd polar-deep-insights/Docker/insight-generator -
Build from local
docker build -t uscdatascience/pdi-generator -f InsightGenDockerfile .OR pull from docker hub
docker pull uscdatascience/pdi-generator -
PDI_JSON_PATH=/data/polar docker-compose up -d
-
-
This container exposes the following ports:
8765 - Geo Topic Parser
9998 - Apache Tika Server
8060 - Grobid Quantities REST API
-
git clone https://github.com/USCDataScience/polar-deep-insights.git && cd polar-deep-insights/Docker/insight-visualizer -
docker build -t uscdatascience/polar-deep-insights -f PolarDeepInsightsDockerfile .ORdocker pull uscdatascience/polar-deep-insights -
PDI_JSON_PATH=data/polar docker-compose up -d -
Access application at http://localhost/pdi/
-
Access elasticsearch at http://localhost/elasticsearch/
-
This container exposes the following ports:
80 - Apache2/HTTPD server
9000 - Grunt server servig up the PDI application
9200 - Elasticsearch 2.4.6 server
35729 - Auto refresh port for AngularJS apps
PS: You need to add CORS extension to the browser and to enable it in order to download concept ontology and additional precomputed information from http://polar.usc.edu/elasticsearch/ and elsewhere.
docker logs -f container_id - use your docker container's id
docker exec -it container_id bash - use your docker container's id
Information Retrieval and Data Science (IRDS) research group, University of Southern California.