You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The panel loader is designed to read a CSV file and load the data onto the HMDA-Platform. The CSV file should use the `|` (pipe) delimiter, and should include a header row as the first line.
3
+
The panel loader is designed to read a CSV file of institution data and load them onto the HMDA-Platform. It can be used to load data either into a local Cassandra instance or a remote one (e.g. in a cluster).
5
4
6
-
## Environment Variables
7
-
There is only one environment variable used by the panel loader. It must be set correctly in order for the data to be sent to the admin API.
5
+
## The Panel File
8
6
9
-
For testing on an API running in SBT, no changes need to be made. The default for this variable will point to the correct local admin API.
7
+
The CSV file should use the `|` (pipe) delimiter, and should include a header row as the first line.
8
+
9
+
A small example file (~200 institutions) is located at `panel/src/main/resources/inst_data_2017_dummy.csv`
10
+
11
+
The real panel file (~160,000 institutions) is located at `panel/src/main/resources/inst_data_2017.csv`
12
+
13
+
14
+
## Loading Institutions Remotely
15
+
16
+
For loading panel data into a remote system or into a local Docker container, you don't need to have any services running on your local environment as dependencies. You will need to set the `HMDA_HTTP_ADMIN_URL` environment variable.
10
17
11
-
For loading panel data into a remote system or into a local Docker container, you'll need to set the following environment variable:
12
18
```shell
13
-
>export HMDA_HTTP_ADMIN_URL={base URL}
19
+
>export HMDA_HTTP_ADMIN_URL={admin URL}
14
20
```
15
21
16
22
**IMPORTANT NOTE:** The base URL should *include*`http://` or `https://`, but *exclude* any trailing backslash `/`. For example:
17
23
24
+
To load panel data into the cluster, simply find the URL of the admin API (for the release branch: `https://hmda-ops-api.demo.cfpb.gov/admin`).
25
+
26
+
To load panel data into a Docker container running locally, the URL will depend on your Docker Machine's IP. If it uses the default IP, this will be the admin API URL:
A small example file is located at `panel/src/main/resources/inst_data_2017_dummy.csv`
31
+
Once that variable is set, use the instructions in [Running the Loader](#running-the-loader) to load the data.
32
+
24
33
25
-
The real panel file is located at `panel/src/main/resources/inst_data_2017.csv`
34
+
## Loading Institutions Locally
26
35
27
-
In order for the panel data to be loaded locally, the API project must be up and running, along with Docker containers running Cassandra and Zookeper, or run the full `docker-compose` setup. To load panel data into the cluster, simply find the URL of the admin api (for the release branch: `https://hmda-ops-api.demo.cfpb.gov/admin`). No other running services are necessary.
36
+
In order for the panel data to be loaded locally, the API project must be up and running, along with Docker containers running Cassandra, PostgreSQL, and Zookeper. Once the dependencies are running, use the instructions in [Running the Loader](#running-the-loader) to load the data.
37
+
38
+
### Running the Dependencies
39
+
40
+
#### Cassandra
41
+
42
+
The easiest way to run a Cassandra server to support this application for testing is to do it through Docker:
* Set the environement variables for Zookeper. `ZOOKEEPER_HOST` uses your Docker Machine's IP address. In this example, we use the default Docker Machine IP:
76
+
77
+
```shell
78
+
export ZOOKEEPER_HOST=192.168.99.100
79
+
export ZOOKEEPER_PORT=2181
80
+
```
81
+
82
+
* Set the environment variables for the local Cassandra instance. `CASSANDRA_CLUSTER_HOSTS` also uses the Docker Machine IP:
83
+
84
+
```shell
85
+
export CASSANDRA_CLUSTER_HOSTS=192.168.99.100
86
+
export CASSANDRA_CLUSTER_PORT=9042
87
+
```
88
+
89
+
* Tell the platform to use Cassandra as its database instead of LevelDB:
90
+
91
+
```shell
92
+
export HMDA_IS_DEMO=false
93
+
```
94
+
95
+
* Start sbt using the command `sbt`, then use these commands at the sbt prompt:
96
+
97
+
```shell
98
+
project api
99
+
clean
100
+
re-start
101
+
```
102
+
103
+
## Running the Loader
28
104
29
105
In a terminal, execute the following commands:
30
106
@@ -41,6 +117,7 @@ sbt> assembly
41
117
```
42
118
Then the panel loader can be run with `java -jar panel/target/scala-2.12/panel.jar path/to/institution_file.csv`
43
119
120
+
44
121
## Error codes
45
122
There are four ways the panel loader can fail. The exit code and error message should tell you what happened.
46
123
@@ -49,7 +126,11 @@ There are four ways the panel loader can fail. The exit code and error message
49
126
3. The call to `institutions/create` didn't return the correct response. This can indicate that you don't have the correct environment variables set, or that something is wrong with the hmda-platform.
50
127
4. The loader didn't finish processing all the institutions. This will happen when running the real panel file, but unsure as to why this happens.
51
128
129
+
52
130
## Testing
131
+
132
+
Once you have run the Panel Loader with an institution file, you can check the HMDA API to see that the data loaded correctly.
133
+
53
134
Make sure your authorization header is updated with a few real `id_rssd` fields from the given file. This can be found in the API log output (first field argument in the `InstitutionQuery` object), or in the CSV file (seventh field).
54
135
55
136
Try out the endpoint `localhost:8080/institutions`, and you should see a response with real panel data.
Copy file name to clipboardExpand all lines: README.md
+66-95Lines changed: 66 additions & 95 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,10 @@ For more information on HMDA, checkout the [About HMDA page](http://www.consumer
14
14
15
15
This repository contains the code for the entirety of the HMDA platform backend. This platform has been designed to accommodate the needs of the HMDA filing process by financial institutions, as well as the data management and publication needs of the HMDA data asset.
16
16
17
+
The HMDA Platform uses sbt's multi-project builds, each project representing a specific task. The platform is an Akka Cluster
18
+
application that can be deployed on a single node or as a distributed application. For more information on how Akka Cluster
19
+
is used, see the documentation [here](Documents/cluster.md)
20
+
17
21
The HMDA Platform is composed of the following modules:
18
22
19
23
### Parser (JS/JVM)
@@ -70,90 +74,101 @@ The HMDA Platform is written in [Scala](http://www.scala-lang.org/). To build it
70
74
71
75
In addition, you'll need Scala's interactive build tool [sbt](http://www.scala-sbt.org/0.13/tutorial/index.html). Please refer to sbt's [installation instructions](http://www.scala-sbt.org/0.13/tutorial/Setup.html) to get started.
72
76
73
-
##Building and Running
77
+
### Docker
74
78
75
-
The HMDA Platform uses sbt's multi-project builds, each project representing a specific task. The platform is an Akka Cluster
76
-
application that can be deployed on a single node or as a distributed application. For more information on how Akka Cluster
77
-
is used, see the documentation [here](Documents/cluster.md)
79
+
Though Docker is not a dependency of the Scala project, it is very useful for running and smoke testing locally.
80
+
Use the following steps to prepare a local environment for running the Platform with docker:
78
81
79
-
### Interactive
82
+
First, make sure that you have the [Docker Toolbox](https://www.docker.com/docker-toolbox) installed.
80
83
81
-
* The write side of this system is supported by either a local `leveldb` database or Cassandra. By default, the local `leveldb` is utilized, and some sample data is loaded automatically.
82
-
If using `Cassandra` is desired, the following environment variable needs to be set:
84
+
If you don't have a Docker machine created, you can create one with the default parameters using the command below.
85
+
This will be sufficient for running most docker containers (e.g. the dev dependencies for the API), but not for running the entire platform.
83
86
84
87
```shell
85
-
export HDMA_IS_DEMO=false
88
+
docker-machine create --driver virtualbox dev
86
89
```
87
90
88
-
The easiest way to run a Cassandra server to support this application for testing is to do it through Docker:
91
+
If you wish to run the entire platform using Docker (currently the only way to run the entire platform),
92
+
you'll need to dedicate more resources to the Docker machine.
93
+
We've found that for the full stack to run efficiently, you need approximately:
Once the `Cassandra` server is running, set the following environment variable to the appropriate Cassandra host (in this example, the default local docker host for a machine running MacOs X):
111
+
If your settings are below these suggestions, you should create a new
112
+
Docker VM. The following will create a VM named `hmda-platform` with
113
+
the appropriate resources:
101
114
102
115
```shell
103
-
export CASSANDRA_CLUSTER_HOSTS=192.168.99.100
116
+
$ docker-machine create \
117
+
--driver virtualbox \
118
+
--virtualbox-disk-size 81920 \
119
+
--virtualbox-cpu-count 4 \
120
+
--virtualbox-memory 6144 \
121
+
hmda-platform
104
122
```
105
123
106
-
To load data into `Cassandra`, you can run the following (the Cassandra server needs to be running and correct environment variables configured as per the previous instructions):
107
-
124
+
After the machine is created, make sure that you connect your shell with the newly created machine
108
125
```shell
109
-
$ sbt
110
-
project panel
111
-
run <full local path to sample file>
126
+
$ eval"(docker-machine env dev)"
112
127
```
113
-
A sample file is located in the following folder: `panel/src/main/resources/inst_data_2017_dummy.csv`
114
128
115
129
116
-
* In order to support the read side, a local PostgreSQL and Cassandra server are needed. Assuming it runs on the default port, on the same machine as the API, the following environment variable needs to be set:
130
+
## Building and Running
131
+
132
+
### Building the .jar
133
+
134
+
* To build JVM artifacts (the default, includes all projects), from the sbt prompt:
where `hmda` is the name of the `PostgreSQL` database, owned by the default user with default password (`postgres`)
123
-
124
-
For Cassandra, the following environment variables need to be set (assuming Cassandra is running on a docker container as described above):
140
+
This task will create a `fat jar`, which can be executed directly on any JDK8 compliant JVM:
125
141
126
142
```shell
127
-
export CASSANDRA_CLUSTER_HOSTS=192.168.99.100
128
-
export CASSANDRA_CLUSTER_PORT=9042
143
+
java -jar target/scala-2.11/hmda.jar
129
144
```
130
145
131
-
**Note: if you are running the backend only through sbt, the database needs to be created manually in advance, see instructions [here](https://www.postgresql.org/docs/9.1/static/manage-ag-createdb.html)**
132
146
133
-
* The `HMDA Platform` is a distributed system that is meant to be run as a clustered application in production.
134
-
As such, it needs a mechanism for storing configuration information for additional nodes joining the cluster.
135
-
`Apache Zookeeper` is used to store this information. To run the project, zookeeper must be running and available in the local network.
136
-
An easy way to satisfy this requirement is to launch a docker container with `ZooKeeper`, as follows:
Assuming you have Docker-Compose installed (according to the [Docker](#docker) instructions above),
152
+
the easiest way to get all of the platform's dependencies up and running with the provided docker-compose dev setup:
143
153
144
154
```shell
145
-
export ZOOKEEPER_HOST=192.168.99.100
146
-
export ZOOKEEPER_PORT=2181
155
+
docker-compose -f docker-dev.yml up
147
156
```
148
157
149
-
Alternatively, these dependencies (`Cassandra`, `Zookeeper` and `PostgreSQL`) can be started from `docker` providing default resources for the `HMDA Platform`:
158
+
When finished, use `docker-compose down` to gracefully stop the running containers.
159
+
150
160
151
-
`docker-compose -f docker-dev.yml up`
161
+
#### Running the API
152
162
153
-
* If you want to use the sample files in this repo for testing the app, run the edits in demo mode. Otherwise, edit S025 will trigger for all files.
163
+
Once the dependencies (above) are running, follow these steps in a separate terminal session to get the API running with sbt:
164
+
165
+
* For smoke testing locally, add the following two environment variables:
166
+
*`EDITS_DEMO_MODE`: This will allow you to use the sample files in this repo for testing the app. Otherwise, edit S025 will trigger for all files.
167
+
*`HMDA_IS_DEMO`: This uses configuration files that allow running the app locally, instead of in a cluster.
154
168
155
169
```shell
156
170
export EDITS_DEMO_MODE=true
171
+
export HMDA_IS_DEMO=true
157
172
```
158
173
159
174
* Start `sbt`
@@ -173,38 +188,17 @@ $ sbt
173
188
174
189
Confirm that the platform is up and running by browsing to http://localhost:8080
175
190
176
-
* To build JVM artifacts (the default, includes all projects), from the sbt prompt:
177
-
178
-
```shell
179
-
> clean assembly
180
-
```
181
-
182
-
This task will create a `fat jar`, which can be executed directly on any JDK8 compliant JVM:
183
-
184
-
```shell
185
-
java -jar target/scala-2.11/hmda.jar
186
-
```
187
-
191
+
When finished, press enter to get the sbt prompt, then stop the project by entering `reStop`.
188
192
189
-
### Docker
190
193
191
-
First, make sure that you have the [Docker Toolbox](https://www.docker.com/docker-toolbox) installed.
194
+
### Running the Project with Docker
192
195
193
-
If you don't have a Docker machine created, you can create one by issuing the following:
194
-
```shell
195
-
docker-machine create --driver virtualbox dev
196
-
```
197
-
198
-
After the machine is created, make sure that you connect your shell with the newly created machine
199
-
```shell
200
-
$ eval"(docker-machine env dev)"
201
-
```
196
+
#### To run only the API
202
197
203
-
Ensure there's a compiled jar to create the Docker image with:
198
+
First, ensure there's a compiled jar to create the Docker image with:
The Filing API will run on `$(docker-machine ip):8080`
220
214
The Public API will run on `$(docker-machine ip):8082`
221
215
216
+
By default, the `HDMA Platform` runs with a log level of `INFO`. This can be changed by establishing a different log level in the `HMDA_LOGLEVEL` environment variable.
217
+
For the different logging options, see the [reference.conf](https://github.com/akka/akka/blob/master/akka-actor/src/main/resources/reference.conf#L38) default configuration file for `Akka`.
218
+
222
219
#### To run the entire platform
223
220
224
-
1. Dedicate appropriate resources to your Docker environment. We've found
225
-
that for the full stack to run efficiently, you need approximately:
226
-
227
-
* 4 CPUs
228
-
* 6 GB RAM
229
-
* 80 GB Disk space
230
-
231
-
Assuming you are using Docker Machine to provision your Docker
232
-
environment, you can check you current settings with the following
0 commit comments