Skip to content

Commit d375d6d

Browse files
committed
Merge branch 'master' of https://github.com/cfpb/hmda-platform
2 parents c369e2e + 1af4ca7 commit d375d6d

File tree

38 files changed

+870
-241
lines changed

38 files changed

+870
-241
lines changed

Documents/panel.md

Lines changed: 92 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,106 @@
11
# Panel CSV Loader
22

3-
## Overview
4-
The panel loader is designed to read a CSV file and load the data onto the HMDA-Platform. The CSV file should use the `|` (pipe) delimiter, and should include a header row as the first line.
3+
The panel loader is designed to read a CSV file of institution data and load them onto the HMDA-Platform. It can be used to load data either into a local Cassandra instance or a remote one (e.g. in a cluster).
54

6-
## Environment Variables
7-
There is only one environment variable used by the panel loader. It must be set correctly in order for the data to be sent to the admin API.
5+
## The Panel File
86

9-
For testing on an API running in SBT, no changes need to be made. The default for this variable will point to the correct local admin API.
7+
The CSV file should use the `|` (pipe) delimiter, and should include a header row as the first line.
8+
9+
A small example file (~200 institutions) is located at `panel/src/main/resources/inst_data_2017_dummy.csv`
10+
11+
The real panel file (~160,000 institutions) is located at `panel/src/main/resources/inst_data_2017.csv`
12+
13+
14+
## Loading Institutions Remotely
15+
16+
For loading panel data into a remote system or into a local Docker container, you don't need to have any services running on your local environment as dependencies. You will need to set the `HMDA_HTTP_ADMIN_URL` environment variable.
1017

11-
For loading panel data into a remote system or into a local Docker container, you'll need to set the following environment variable:
1218
```shell
13-
> export HMDA_HTTP_ADMIN_URL={base URL}
19+
> export HMDA_HTTP_ADMIN_URL={admin URL}
1420
```
1521

1622
**IMPORTANT NOTE:** The base URL should *include* `http://` or `https://`, but *exclude* any trailing backslash `/`. For example:
1723

24+
To load panel data into the cluster, simply find the URL of the admin API (for the release branch: `https://hmda-ops-api.demo.cfpb.gov/admin`).
25+
26+
To load panel data into a Docker container running locally, the URL will depend on your Docker Machine's IP. If it uses the default IP, this will be the admin API URL:
1827
```shell
1928
> export HMDA_HTTP_ADMIN_URL=http://192.168.99.100:8081
2029
```
2130

22-
## Running the parser
23-
A small example file is located at `panel/src/main/resources/inst_data_2017_dummy.csv`
31+
Once that variable is set, use the instructions in [Running the Loader](#running-the-loader) to load the data.
32+
2433

25-
The real panel file is located at `panel/src/main/resources/inst_data_2017.csv`
34+
## Loading Institutions Locally
2635

27-
In order for the panel data to be loaded locally, the API project must be up and running, along with Docker containers running Cassandra and Zookeper, or run the full `docker-compose` setup. To load panel data into the cluster, simply find the URL of the admin api (for the release branch: `https://hmda-ops-api.demo.cfpb.gov/admin`). No other running services are necessary.
36+
In order for the panel data to be loaded locally, the API project must be up and running, along with Docker containers running Cassandra, PostgreSQL, and Zookeper. Once the dependencies are running, use the instructions in [Running the Loader](#running-the-loader) to load the data.
37+
38+
### Running the Dependencies
39+
40+
#### Cassandra
41+
42+
The easiest way to run a Cassandra server to support this application for testing is to do it through Docker:
43+
44+
```shell
45+
docker run --name cassandra -p 9042:9042 -p 7000:7000 -p 7199:7199 cassandra:3.10
46+
```
47+
48+
If you want to connect to this server, the following `docker` command will give you access to the Cassandra instance started in the previous step:
49+
50+
```shell
51+
docker run -it --link cassandra:cassandra --rm cassandra cqlsh cassandra
52+
```
53+
54+
#### Apache Zookeeper
55+
56+
The `HMDA Platform` is a distributed system that is meant to be run as a clustered application in production.
57+
As such, it needs a mechanism for storing configuration information for additional nodes joining the cluster.
58+
`Apache Zookeeper` is used to store this information. To run the project, zookeeper must be running and available in the local network.
59+
An easy way to satisfy this requirement is to launch a docker container with `ZooKeeper`, as follows:
60+
61+
```shell
62+
$ docker run --rm -p 2181:2181 -p 2888:2888 -p 3888:3888 jplock/zookeeper
63+
```
64+
65+
#### PostgreSQL
66+
67+
To run Postgres from a Docker container with the correct ports to connect to the HMDA Platform, use the following command:
68+
69+
```shell
70+
docker run -e POSTGRES_PASSWORD=postgres -e POSTGRES_USER=postgres -e POSTGRES_DB=hmda -p 54321:5432 postgres:9.6.1
71+
```
72+
73+
#### HMDA API
74+
75+
* Set the environement variables for Zookeper. `ZOOKEEPER_HOST` uses your Docker Machine's IP address. In this example, we use the default Docker Machine IP:
76+
77+
```shell
78+
export ZOOKEEPER_HOST=192.168.99.100
79+
export ZOOKEEPER_PORT=2181
80+
```
81+
82+
* Set the environment variables for the local Cassandra instance. `CASSANDRA_CLUSTER_HOSTS` also uses the Docker Machine IP:
83+
84+
```shell
85+
export CASSANDRA_CLUSTER_HOSTS=192.168.99.100
86+
export CASSANDRA_CLUSTER_PORT=9042
87+
```
88+
89+
* Tell the platform to use Cassandra as its database instead of LevelDB:
90+
91+
```shell
92+
export HMDA_IS_DEMO=false
93+
```
94+
95+
* Start sbt using the command `sbt`, then use these commands at the sbt prompt:
96+
97+
```shell
98+
project api
99+
clean
100+
re-start
101+
```
102+
103+
## Running the Loader
28104

29105
In a terminal, execute the following commands:
30106

@@ -41,6 +117,7 @@ sbt> assembly
41117
```
42118
Then the panel loader can be run with `java -jar panel/target/scala-2.12/panel.jar path/to/institution_file.csv`
43119

120+
44121
## Error codes
45122
There are four ways the panel loader can fail. The exit code and error message should tell you what happened.
46123

@@ -49,7 +126,11 @@ There are four ways the panel loader can fail. The exit code and error message
49126
3. The call to `institutions/create` didn't return the correct response. This can indicate that you don't have the correct environment variables set, or that something is wrong with the hmda-platform.
50127
4. The loader didn't finish processing all the institutions. This will happen when running the real panel file, but unsure as to why this happens.
51128

129+
52130
## Testing
131+
132+
Once you have run the Panel Loader with an institution file, you can check the HMDA API to see that the data loaded correctly.
133+
53134
Make sure your authorization header is updated with a few real `id_rssd` fields from the given file. This can be found in the API log output (first field argument in the `InstitutionQuery` object), or in the CSV file (seventh field).
54135

55136
Try out the endpoint `localhost:8080/institutions`, and you should see a response with real panel data.

README.md

Lines changed: 66 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@ For more information on HMDA, checkout the [About HMDA page](http://www.consumer
1414

1515
This repository contains the code for the entirety of the HMDA platform backend. This platform has been designed to accommodate the needs of the HMDA filing process by financial institutions, as well as the data management and publication needs of the HMDA data asset.
1616

17+
The HMDA Platform uses sbt's multi-project builds, each project representing a specific task. The platform is an Akka Cluster
18+
application that can be deployed on a single node or as a distributed application. For more information on how Akka Cluster
19+
is used, see the documentation [here](Documents/cluster.md)
20+
1721
The HMDA Platform is composed of the following modules:
1822

1923
### Parser (JS/JVM)
@@ -70,90 +74,101 @@ The HMDA Platform is written in [Scala](http://www.scala-lang.org/). To build it
7074

7175
In addition, you'll need Scala's interactive build tool [sbt](http://www.scala-sbt.org/0.13/tutorial/index.html). Please refer to sbt's [installation instructions](http://www.scala-sbt.org/0.13/tutorial/Setup.html) to get started.
7276

73-
## Building and Running
77+
### Docker
7478

75-
The HMDA Platform uses sbt's multi-project builds, each project representing a specific task. The platform is an Akka Cluster
76-
application that can be deployed on a single node or as a distributed application. For more information on how Akka Cluster
77-
is used, see the documentation [here](Documents/cluster.md)
79+
Though Docker is not a dependency of the Scala project, it is very useful for running and smoke testing locally.
80+
Use the following steps to prepare a local environment for running the Platform with docker:
7881

79-
### Interactive
82+
First, make sure that you have the [Docker Toolbox](https://www.docker.com/docker-toolbox) installed.
8083

81-
* The write side of this system is supported by either a local `leveldb` database or Cassandra. By default, the local `leveldb` is utilized, and some sample data is loaded automatically.
82-
If using `Cassandra` is desired, the following environment variable needs to be set:
84+
If you don't have a Docker machine created, you can create one with the default parameters using the command below.
85+
This will be sufficient for running most docker containers (e.g. the dev dependencies for the API), but not for running the entire platform.
8386

8487
```shell
85-
export HDMA_IS_DEMO=false
88+
docker-machine create --driver virtualbox dev
8689
```
8790

88-
The easiest way to run a Cassandra server to support this application for testing is to do it through Docker:
91+
If you wish to run the entire platform using Docker (currently the only way to run the entire platform),
92+
you'll need to dedicate more resources to the Docker machine.
93+
We've found that for the full stack to run efficiently, you need approximately:
8994

90-
```shell
91-
docker run --name cassandra -p 9042:9042 -p 7000:7000 -p 7199:7199 cassandra:3.10
92-
```
95+
* 4 CPUs
96+
* 6 GB RAM
97+
* 80 GB Disk space
9398

94-
If you want to connect to this server, the following `docker` command will give you access to the Cassandra instance started in the previous step:
99+
Assuming you are using Docker Machine to provision your Docker
100+
environment, you can check you current settings with the following
101+
(ignore the second `Memory`):
95102

96103
```shell
97-
docker run -it --link cassandra:cassandra --rm cassandra cqlsh cassandra
104+
$ docker-machine inspect | grep 'CPU\|Memory\|DiskSize'
105+
"CPU": 4,
106+
"Memory": 6144,
107+
"DiskSize": 81920,
108+
"Memory": 0,
98109
```
99110

100-
Once the `Cassandra` server is running, set the following environment variable to the appropriate Cassandra host (in this example, the default local docker host for a machine running MacOs X):
111+
If your settings are below these suggestions, you should create a new
112+
Docker VM. The following will create a VM named `hmda-platform` with
113+
the appropriate resources:
101114

102115
```shell
103-
export CASSANDRA_CLUSTER_HOSTS=192.168.99.100
116+
$ docker-machine create \
117+
--driver virtualbox \
118+
--virtualbox-disk-size 81920 \
119+
--virtualbox-cpu-count 4 \
120+
--virtualbox-memory 6144 \
121+
hmda-platform
104122
```
105123

106-
To load data into `Cassandra`, you can run the following (the Cassandra server needs to be running and correct environment variables configured as per the previous instructions):
107-
124+
After the machine is created, make sure that you connect your shell with the newly created machine
108125
```shell
109-
$ sbt
110-
project panel
111-
run <full local path to sample file>
126+
$ eval "(docker-machine env dev)"
112127
```
113-
A sample file is located in the following folder: `panel/src/main/resources/inst_data_2017_dummy.csv`
114128

115129

116-
* In order to support the read side, a local PostgreSQL and Cassandra server are needed. Assuming it runs on the default port, on the same machine as the API, the following environment variable needs to be set:
130+
## Building and Running
131+
132+
### Building the .jar
133+
134+
* To build JVM artifacts (the default, includes all projects), from the sbt prompt:
117135

118136
```shell
119-
export JDBC_URL='jdbc:postgresql://localhost/hmda?user=postgres&password=postgres'
137+
> clean assembly
120138
```
121139

122-
where `hmda` is the name of the `PostgreSQL` database, owned by the default user with default password (`postgres`)
123-
124-
For Cassandra, the following environment variables need to be set (assuming Cassandra is running on a docker container as described above):
140+
This task will create a `fat jar`, which can be executed directly on any JDK8 compliant JVM:
125141

126142
```shell
127-
export CASSANDRA_CLUSTER_HOSTS=192.168.99.100
128-
export CASSANDRA_CLUSTER_PORT=9042
143+
java -jar target/scala-2.11/hmda.jar
129144
```
130145

131-
**Note: if you are running the backend only through sbt, the database needs to be created manually in advance, see instructions [here](https://www.postgresql.org/docs/9.1/static/manage-ag-createdb.html)**
132146

133-
* The `HMDA Platform` is a distributed system that is meant to be run as a clustered application in production.
134-
As such, it needs a mechanism for storing configuration information for additional nodes joining the cluster.
135-
`Apache Zookeeper` is used to store this information. To run the project, zookeeper must be running and available in the local network.
136-
An easy way to satisfy this requirement is to launch a docker container with `ZooKeeper`, as follows:
147+
### Running Interactively
137148

138-
```shell
139-
$ docker run --rm -p 2181:2181 -p 2888:2888 -p 3888:3888 jplock/zookeeper
140-
```
149+
#### Running the Dependencies
141150

142-
* Set the environemnet variables for Zookeper
151+
Assuming you have Docker-Compose installed (according to the [Docker](#docker) instructions above),
152+
the easiest way to get all of the platform's dependencies up and running with the provided docker-compose dev setup:
143153

144154
```shell
145-
export ZOOKEEPER_HOST=192.168.99.100
146-
export ZOOKEEPER_PORT=2181
155+
docker-compose -f docker-dev.yml up
147156
```
148157

149-
Alternatively, these dependencies (`Cassandra`, `Zookeeper` and `PostgreSQL`) can be started from `docker` providing default resources for the `HMDA Platform`:
158+
When finished, use `docker-compose down` to gracefully stop the running containers.
159+
150160

151-
`docker-compose -f docker-dev.yml up`
161+
#### Running the API
152162

153-
* If you want to use the sample files in this repo for testing the app, run the edits in demo mode. Otherwise, edit S025 will trigger for all files.
163+
Once the dependencies (above) are running, follow these steps in a separate terminal session to get the API running with sbt:
164+
165+
* For smoke testing locally, add the following two environment variables:
166+
* `EDITS_DEMO_MODE`: This will allow you to use the sample files in this repo for testing the app. Otherwise, edit S025 will trigger for all files.
167+
* `HMDA_IS_DEMO`: This uses configuration files that allow running the app locally, instead of in a cluster.
154168

155169
```shell
156170
export EDITS_DEMO_MODE=true
171+
export HMDA_IS_DEMO=true
157172
```
158173

159174
* Start `sbt`
@@ -173,38 +188,17 @@ $ sbt
173188

174189
Confirm that the platform is up and running by browsing to http://localhost:8080
175190

176-
* To build JVM artifacts (the default, includes all projects), from the sbt prompt:
177-
178-
```shell
179-
> clean assembly
180-
```
181-
182-
This task will create a `fat jar`, which can be executed directly on any JDK8 compliant JVM:
183-
184-
```shell
185-
java -jar target/scala-2.11/hmda.jar
186-
```
187-
191+
When finished, press enter to get the sbt prompt, then stop the project by entering `reStop`.
188192

189-
### Docker
190193

191-
First, make sure that you have the [Docker Toolbox](https://www.docker.com/docker-toolbox) installed.
194+
### Running the Project with Docker
192195

193-
If you don't have a Docker machine created, you can create one by issuing the following:
194-
```shell
195-
docker-machine create --driver virtualbox dev
196-
```
197-
198-
After the machine is created, make sure that you connect your shell with the newly created machine
199-
```shell
200-
$ eval "(docker-machine env dev)"
201-
```
196+
#### To run only the API
202197

203-
Ensure there's a compiled jar to create the Docker image with:
198+
First, ensure there's a compiled jar to create the Docker image with:
204199
```shell
205200
sbt clean assembly
206201
```
207-
#### To run only the API
208202

209203
Build the docker image
210204
```shell
@@ -219,35 +213,12 @@ docker run -d -p "8080:8080 -p 8082:8082" hmda-api
219213
The Filing API will run on `$(docker-machine ip):8080`
220214
The Public API will run on `$(docker-machine ip):8082`
221215

216+
By default, the `HDMA Platform` runs with a log level of `INFO`. This can be changed by establishing a different log level in the `HMDA_LOGLEVEL` environment variable.
217+
For the different logging options, see the [reference.conf](https://github.com/akka/akka/blob/master/akka-actor/src/main/resources/reference.conf#L38) default configuration file for `Akka`.
218+
222219
#### To run the entire platform
223220

224-
1. Dedicate appropriate resources to your Docker environment. We've found
225-
that for the full stack to run efficiently, you need approximately:
226-
227-
* 4 CPUs
228-
* 6 GB RAM
229-
* 80 GB Disk space
230-
231-
Assuming you are using Docker Machine to provision your Docker
232-
environment, you can check you current settings with the following
233-
(ignore the second `Memory`):
234-
235-
$ docker-machine inspect | grep 'CPU\|Memory\|DiskSize'
236-
"CPU": 4,
237-
"Memory": 6144,
238-
"DiskSize": 81920,
239-
"Memory": 0,
240-
241-
If your settings are below these suggestions, you should create a new
242-
Docker VM. The following will create a VM named `hmda-platform` with
243-
the appropriate resources:
244-
245-
$ docker-machine create \
246-
--driver virtualbox \
247-
--virtualbox-disk-size 81920 \
248-
--virtualbox-cpu-count 4 \
249-
--virtualbox-memory 6144 \
250-
hmda-platform
221+
1. Ensure you have a Docker Machine with sufficient resources, as described in the [Docker](#docker) section above.
251222

252223
1. Clone [hmda-platform-ui](https://github.com/cfpb/hmda-platform-ui) and
253224
[hmda-platform-auth](https://github.com/cfpb/hmda-platform-auth) into the same

api/src/main/resources/application-dev.conf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
akka {
22
loggers = ["akka.event.slf4j.Slf4jLogger"]
33
loglevel = "INFO"
4+
loglevel = ${?HMDA_LOGLEVEL}
45
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
56
http.parsing.max-content-length = 1G
67
http.server.default-host-header = "cfpb.gov"

api/src/main/resources/application.conf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
akka {
22
loggers = ["akka.event.slf4j.Slf4jLogger"]
33
loglevel = "INFO"
4+
loglevel = ${?HMDA_LOGLEVEL}
45
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
56
http.parsing.max-content-length = 1G
67
http.server.default-host-header = "cfpb.gov"

api/src/main/resources/logback.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,5 +18,6 @@
1818
<logger name="com.zaxxer.hikari" level="INFO" />
1919
<logger name="com.datastax.driver" level="INFO" />
2020
<logger name="org.apache.zookeeper" level="WARN" />
21+
<logger name="de.heikoseeberger.constructr" level="INFO"/>
2122

2223
</configuration>

0 commit comments

Comments
 (0)