-
Notifications
You must be signed in to change notification settings - Fork 8
Instructions to install and run PDI Docker
This is the Dockerized version of the insights portion of the Polar Deep Insights system. Two parts of the project, the insight-generator, a python library used to extract information, and the insight-visualizer, a javascript application used for data visualization, can be installed and run in Docker containers using the instructions below.
-
Install docker - if it isn't already installed
-
If you normally log into a docker registry on your machine, do so now
-
At a terminal window type
git clone https://github.com/USCDataScience/polar-deep-insights.git -
Then type
cd polar-deep-insights/Docker -
Install npm if it isn't installed already, else skip this step.
-
Install elastic search tools. Depending on your permissions you may have to type
-
npm install -g elasticsearch-toolsor -
sudo npm install -g elasticsearch-toolsand entering your password at the prompt
-
-
Make the sript setup.sh executable by typing
chmod +x setup.sh- It should be noted that
./setup.shcreates a data folder and populates it with a variety of other required files and empty folders - If you are planning on analyzing your own files, please put them in the in the
data/filesfolder. Any format is acceptable, though the parsers may not extract all possible data if the format is very unusual.
- It should be noted that
-
Export elastic index mappings.
-
If using polar.usc.edu's elastic search data, type
es-export-mappings --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data-mappings.json -
If using your own database - Replace the
http://polar.usc.edu/elasticsearchin the above command with your remote elastic search url or your localhost elastic index's url and run the above command.
-
-
Export elastic index data.
-
If using polar.usc.edu's elastic search data type
es-export-bulk --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data.json -
If using your own database - Replace the
http://polar.usc.edu/elasticsearchin the above command with your remote elastic search url or your localhost elastic search url and run the above command.
PS: This step may take a while depending on the size of your elasticsearch database. The Polar data set contains 100k documents and takes quite a long time (go get coffee).
-
-
Install some necessary files - description can be found here.
- For Linux based OS(Ubuntu, MacOS, etc):
chmod +x pre_installation.sh-
./pre_installation.shThis step will install the necessary sh files from the web and uses the wget command. If you encounter an error : wget not found:-- Install wget (eg: for MacOS :
brew install wget) OR - Open pre_installation.sh and replace
wgetwithcurl -0 filenamewhere filename is the name of the file on each command OR - Refer to point 1.ii.b
- Install wget (eg: for MacOS :
- For Windows OS:
- For Linux based OS(Ubuntu, MacOS, etc):
-
Add files to the following folders according to these instructions:
-
data/files: Add your data files of any filetype - to generate insights from -
data/polar: Contains mappings and data from the elastic search url -
data/ingest: Output from pdi insight generator will be saved here under the filenameingest_data.json -
data/sparkler/raw: Add Sparkler crawled data from the SOLR index into thesparkler_rawdata.jsonfile in this folder -
data/sparkler/parsed: Sparkler data (indata/sparkler/raw/sparkler_rawdata.json) is parsed using parse.py and saved insparkler_data.json
-
-
Build Insight Generator
-
git clone https://github.com/USCDataScience/polar-deep-insights.git && cd polar-deep-insights/Docker/insight-generator -
Build from local
docker build -t uscdatascience/pdi-generator -f InsightGenDockerfile .OR pull from docker hub
docker pull uscdatascience/pdi-generator -
PDI_JSON_PATH=/data/polar docker-compose up -d
-
-
This container exposes the following ports:
8765 - Geo Topic Parser
9998 - Apache Tika Server
8060 - Grobid Quantities REST API
-
git clone https://github.com/USCDataScience/polar-deep-insights.git && cd polar-deep-insights/Docker/insight-visualizer -
docker build -t uscdatascience/polar-deep-insights -f PolarDeepInsightsDockerfile .ORdocker pull uscdatascience/polar-deep-insights -
PDI_JSON_PATH=data/polar docker-compose up -d -
Access application at http://localhost/pdi/
-
Access elasticsearch at http://localhost/elasticsearch/
-
This container exposes the following ports:
80 - Apache2/HTTPD server
9000 - Grunt server servig up the PDI application
9200 - Elasticsearch 2.4.6 server
35729 - Auto refresh port for AngularJS apps
PS: You need to add CORS extension to the browser and to enable it in order to download concept ontology and additional precomputed information from http://polar.usc.edu/elasticsearch/ and elsewhere.
docker logs -f container_id - use your docker container's id
docker exec -it container_id bash - use your docker container's id
Information Retrieval and Data Science (IRDS) research group, University of Southern California.