- 
                Notifications
    You must be signed in to change notification settings 
- Fork 8
Instructions to install and run PDI Docker
This is the Dockerized version of Polar Deep Insights. The project is composed of two parts, the Insight Generator, which extracts insights from a data set, and the Insight Visualizer, which allows for explorations of those insights.
- Install docker
- Verify that it's running by typing docker psinto a command prompt. If you get a response, it's running.
- If you normally log into a docker registry on your machine, do so now.
 
- Verify that it's running by typing 
- Install elastic search tools
- Install npm if it isn't installed already, else skip this step.
- 
$ sudo npm install -g elasticsearch-tools- Installs elastic search tools.
 
 
- 
Download and unpack Polar Deep Insights - 
$ git clone https://github.com/USCDataScience/polar-deep-insights.git- Creates a polar-deep-insights folder and downloads the project files
 
- cd polar-deep-insights/Docker
 
- 
- 
Download polar.usc.edu index mappings - 
es-export-mappings --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data-mappings.json- These enable insight generator to understand the data provided.
 
 
- 
- 
Download ElasticSearch index data - 
es-export-bulk --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data.json- This will take a while - the polar data set contains 100k documents (go get coffee).
 
 
- 
- 
docker pull uscdatascience/pdi-generator- You can also build locally by using this command : cd insight-generator
 
- You can also build locally by using this command : 
- docker build -t uscdatascience/pdi-generator -f InsightGenDockerfile .
- PDI_JSON_PATH=/data/polar docker-compose up -d
- cd ../insight-visualizer
- 
docker build -t uscdatascience/polar-deep-insights -f PolarDeepInsightsDockerfile .- You can also pull from docker hub with docker pull uscdatascience/polar-deep-insights
 
- You can also pull from docker hub with 
- PDI_JSON_PATH=data/polar docker-compose up -d
- Access application at http://localhost/pdi/
- Access elasticsearch at http://localhost/elasticsearch/
- If you are planning on analyzing your own files, copy them in to the data/filesfolder. The system will recognize and extract data from over 1400 different file types.
- If using your own database - Replace the http://polar.usc.edu/elasticsearchin the above command with your remote elastic search url or your localhost elastic index's url and run the above command.
- 
Add files to the following folders according to these instructions: - 
data/files: Add your data files of any filetype - to generate insights from
- 
data/polar: Contains mappings and data from the elastic search url
- 
data/ingest: Output from pdi insight generator will be saved here under the filenameingest_data.json
- 
data/sparkler/raw: Add Sparkler crawled data from the SOLR index into thesparkler_rawdata.jsonfile in this folder
- 
data/sparkler/parsed: Sparkler data (indata/sparkler/raw/sparkler_rawdata.json) is parsed using parse.py and saved insparkler_data.json
 
- 
- 
The Insight Generator Docker container exposes the following ports: 8765 - Geo Topic Parser 9998 - Apache Tika Server 8060 - Grobid Quantities REST API 
- 
This Insight Visualizer Docker container exposes the following ports: 80 - Apache2/HTTPD server 9000 - Grunt server serving up the PDI application 9200 - Elasticsearch 2.4.6 server 35729 - Auto refresh port for AngularJS apps 
docker logs -f container_id - use your docker container's id
docker exec -it container_id bash - use your docker container's id
PS: You need to add CORS extension to the browser and to enable it in order to download concept ontology and additional precomputed information from http://polar.usc.edu/elasticsearch/ and elsewhere.
Information Retrieval and Data Science (IRDS) research group, University of Southern California.