-
Notifications
You must be signed in to change notification settings - Fork 8
Instructions to install and run PDI Docker
This is the Dockerized version of Polar Deep Insights. The project is composed of two parts, the Insight Generator, which extracts insights from a data set, and the Insight Visualizer, which allows for explorations of those insights.
- Install docker
- Verify that it's running by typing
docker psinto a command prompt. If you get a response, it's running. - If you normally log into a docker registry on your machine, do so now.
- Verify that it's running by typing
- Install elastic search tools
- Install npm if it isn't installed already, else skip this step.
-
$ sudo npm install -g elasticsearch-tools- Installs elastic search tools.
- Download and unpack Polar Deep Insights
-
$ git clone https://github.com/USCDataScience/polar-deep-insights.git- Creates a polar-deep-insights folder and downloads the project files
cd polar-deep-insights/Docker-
./setup.sh- This will create a data folder and populate it with required files and folders.
- If you get a
Permission deniederror, make the file executable:chmod +x setup.sh
-
-
./pre_installation.sh- This will download some additional utilities.
- If you get a
Permission deniederror, make the file executable:chmod +x pre_installation.sh - If you get a
wget not founderror, install wget or manually download the files using the urls listed in pre_installation.sh.
- Download polar.usc.edu index mappings
-
es-export-mappings --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data-mappings.json- These enable insight generator to understand the data provided.
-
- Download ElasticSearch index data
-
es-export-bulk --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data.json- This will take a while - the polar data set contains 100k documents (go get coffee).
-
cd insight-generator-
docker build -t uscdatascience/pdi-generator -f InsightGenDockerfile .- You can also pull from docker hub with
docker pull uscdatascience/pdi-generator
- You can also pull from docker hub with
PDI_JSON_PATH=/data/polar docker-compose up -d
cd ../insight-visualizer-
docker build -t uscdatascience/polar-deep-insights -f PolarDeepInsightsDockerfile .- You can also pull from docker hub with
docker pull uscdatascience/polar-deep-insights
- You can also pull from docker hub with
PDI_JSON_PATH=data/polar docker-compose up -d- Access application at http://localhost/pdi/
- Access elasticsearch at http://localhost/elasticsearch/
- If you are planning on analyzing your own files, copy them in to the
data/filesfolder. The system will recognize and extract data from over 1400 different file types. - If using your own database - Replace the
http://polar.usc.edu/elasticsearchin the above command with your remote elastic search url or your localhost elastic index's url and run the above command.
-
Add files to the following folders according to these instructions:
-
data/files: Add your data files of any filetype - to generate insights from -
data/polar: Contains mappings and data from the elastic search url -
data/ingest: Output from pdi insight generator will be saved here under the filenameingest_data.json -
data/sparkler/raw: Add Sparkler crawled data from the SOLR index into thesparkler_rawdata.jsonfile in this folder -
data/sparkler/parsed: Sparkler data (indata/sparkler/raw/sparkler_rawdata.json) is parsed using parse.py and saved insparkler_data.json
-
-
The Insight Generator Docker container exposes the following ports: 8765 - Geo Topic Parser 9998 - Apache Tika Server 8060 - Grobid Quantities REST API
-
This Insight Visualizer Docker container exposes the following ports: 80 - Apache2/HTTPD server 9000 - Grunt server serving up the PDI application 9200 - Elasticsearch 2.4.6 server 35729 - Auto refresh port for AngularJS apps
docker logs -f container_id - use your docker container's id
docker exec -it container_id bash - use your docker container's id
PS: You need to add CORS extension to the browser and to enable it in order to download concept ontology and additional precomputed information from http://polar.usc.edu/elasticsearch/ and elsewhere.
Information Retrieval and Data Science (IRDS) research group, University of Southern California.