Skip to content

Commit 45140e6

Browse files
committed
Inclue instructions for running all dependencies in Panel docs
1 parent d133b97 commit 45140e6

File tree

1 file changed

+92
-10
lines changed

1 file changed

+92
-10
lines changed

Documents/panel.md

Lines changed: 92 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,106 @@
11
# Panel CSV Loader
22

3-
## Overview
4-
The panel loader is designed to read a CSV file and load the data onto the HMDA-Platform. The CSV file should use the `|` (pipe) delimiter, and should include a header row as the first line.
3+
The panel loader is designed to read a CSV file of institution data and load them onto the HMDA-Platform. It can be used to load data either into a local Cassandra instance or a remote one (e.g. in a cluster).
54

6-
## Environment Variables
5+
## The Panel File
76

8-
For testing on an API running in SBT, no changes need to be made. The default for this variable will point to the correct local admin API.
7+
The CSV file should use the `|` (pipe) delimiter, and should include a header row as the first line.
8+
9+
A small example file (~200 institutions) is located at `panel/src/main/resources/inst_data_2017_dummy.csv`
10+
11+
The real panel file (~160,000 institutions) is located at `panel/src/main/resources/inst_data_2017.csv`
12+
13+
14+
## Loading Institutions Remotely
15+
16+
For loading panel data into a remote system or into a local Docker container, you don't need to have any services running on your local environment as dependencies. You will need to set the `HMDA_HTTP_ADMIN_URL` environment variable.
917

10-
For loading panel data into a remote system or into a local Docker container, you'll need to set the following environment variable:
1118
```shell
12-
> export HMDA_HTTP_ADMIN_URL={base URL}
19+
> export HMDA_HTTP_ADMIN_URL={admin URL}
1320
```
1421

1522
**IMPORTANT NOTE:** The base URL should *include* `http://` or `https://`, but *exclude* any trailing backslash `/`. For example:
1623

24+
To load panel data into the cluster, simply find the URL of the admin API (for the release branch: `https://hmda-ops-api.demo.cfpb.gov/admin`).
25+
26+
To load panel data into a Docker container running locally, the URL will depend on your Docker Machine's IP. If it uses the default IP, this will be the admin API URL:
1727
```shell
1828
> export HMDA_HTTP_ADMIN_URL=http://192.168.99.100:8081
1929
```
2030

21-
## Running the parser
22-
A small example file (~200 institutions) is located at `panel/src/main/resources/inst_data_2017_dummy.csv`
31+
Once that variable is set, use the instructions in [Running the Loader](#running-the-loader) to load the data.
2332

24-
The real panel file (~160,000 institutions) is located at `panel/src/main/resources/inst_data_2017.csv`
2533

26-
In order for the panel data to be loaded locally, the API project must be up and running, along with Docker containers running Cassandra and Zookeper, or run the full `docker-compose` setup. To load panel data into the cluster, simply find the URL of the admin api (for the release branch: `https://hmda-ops-api.demo.cfpb.gov/admin`). No other running services are necessary.
34+
## Loading Institutions Locally
35+
36+
In order for the panel data to be loaded locally, the API project must be up and running, along with Docker containers running Cassandra, PostgreSQL, and Zookeper. Once the dependencies are running, use the instructions in [Running the Loader](#running-the-loader) to load the data.
37+
38+
### Running the Dependencies
39+
40+
#### Cassandra
41+
42+
The easiest way to run a Cassandra server to support this application for testing is to do it through Docker:
43+
44+
```shell
45+
docker run --name cassandra -p 9042:9042 -p 7000:7000 -p 7199:7199 cassandra:3.10
46+
```
47+
48+
If you want to connect to this server, the following `docker` command will give you access to the Cassandra instance started in the previous step:
49+
50+
```shell
51+
docker run -it --link cassandra:cassandra --rm cassandra cqlsh cassandra
52+
```
53+
54+
#### Apache Zookeeper
55+
56+
The `HMDA Platform` is a distributed system that is meant to be run as a clustered application in production.
57+
As such, it needs a mechanism for storing configuration information for additional nodes joining the cluster.
58+
`Apache Zookeeper` is used to store this information. To run the project, zookeeper must be running and available in the local network.
59+
An easy way to satisfy this requirement is to launch a docker container with `ZooKeeper`, as follows:
60+
61+
```shell
62+
$ docker run --rm -p 2181:2181 -p 2888:2888 -p 3888:3888 jplock/zookeeper
63+
```
64+
65+
#### PostgreSQL
66+
67+
To run Postgres from a Docker container with the correct ports to connect to the HMDA Platform, use the following command:
68+
69+
```shell
70+
docker run -e POSTGRES_PASSWORD=postgres -e POSTGRES_USER=postgres -e POSTGRES_DB=hmda -p 54321:5432 postgres:9.6.1
71+
```
72+
73+
#### HMDA API
74+
75+
* Set the environement variables for Zookeper. `ZOOKEEPER_HOST` uses your Docker Machine's IP address. In this example, we use the default Docker Machine IP:
76+
77+
```shell
78+
export ZOOKEEPER_HOST=192.168.99.100
79+
export ZOOKEEPER_PORT=2181
80+
```
81+
82+
* Set the environment variables for the local Cassandra instance. `CASSANDRA_CLUSTER_HOSTS` also uses the Docker Machine IP:
83+
84+
```shell
85+
export CASSANDRA_CLUSTER_HOSTS=192.168.99.100
86+
export CASSANDRA_CLUSTER_PORT=9042
87+
```
88+
89+
* Tell the platform to use Cassandra as its database instead of LevelDB:
90+
91+
```shell
92+
export HMDA_IS_DEMO=false
93+
```
94+
95+
* Start sbt using the command `sbt`, then use these commands at the sbt prompt:
96+
97+
```shell
98+
project api
99+
clean
100+
re-start
101+
```
102+
103+
### Running the Loader
27104

28105
In a terminal, execute the following commands:
29106

@@ -40,6 +117,7 @@ sbt> assembly
40117
```
41118
Then the panel loader can be run with `java -jar panel/target/scala-2.12/panel.jar path/to/institution_file.csv`
42119

120+
43121
## Error codes
44122
There are four ways the panel loader can fail. The exit code and error message should tell you what happened.
45123

@@ -48,7 +126,11 @@ There are four ways the panel loader can fail. The exit code and error message
48126
3. The call to `institutions/create` didn't return the correct response. This can indicate that you don't have the correct environment variables set, or that something is wrong with the hmda-platform.
49127
4. The loader didn't finish processing all the institutions. This will happen when running the real panel file, but unsure as to why this happens.
50128

129+
51130
## Testing
131+
132+
Once you have run the Panel Loader with an institution file, you can check the HMDA API to see that the data loaded correctly.
133+
52134
Make sure your authorization header is updated with a few real `id_rssd` fields from the given file. This can be found in the API log output (first field argument in the `InstitutionQuery` object), or in the CSV file (seventh field).
53135

54136
Try out the endpoint `localhost:8080/institutions`, and you should see a response with real panel data.

0 commit comments

Comments
 (0)