Skip to content

Commit 9664614

Browse files
authored
Merge branch 'staging' into unvendor-awkde
2 parents 45c133e + 19e9b8f commit 9664614

File tree

3 files changed

+108
-0
lines changed

3 files changed

+108
-0
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,3 +59,8 @@ data/src/app/service-account-key.json
5959

6060
# compiled python files
6161
*.pyc
62+
63+
tmp/
64+
65+
# Local python development files
66+
.python-version

data/src/new_etl/data_utils/pwd_parcels.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ def merge_pwd_parcels_gdf(
4444
pwd_parcels_gdf_indexed = pwd_parcels_gdf_unique_opa_id.set_index("opa_id")
4545
merged_gdf_indexed = primary_featurelayer_gdf_unique_opa_id.set_index("opa_id")
4646

47+
# ISSUE: This update and the other transformations might be incorrect
4748
merged_gdf_indexed.update(
4849
pwd_parcels_gdf_indexed[["geometry"]],
4950
)

docs/TECHNOLOGIES.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Project Technologies
2+
3+
## Overview
4+
5+
This document is mean to summarize all of the various technologies and tools used across the Clean & Green Philly project, encompassing both
6+
frontend, backend, ETL pipeline, devops/infrastructure, and miscellaneous tools. The following will include a brief description of each technology
7+
as well as its function within our project as well as links to external resource to help new contributors familiarize themselves.
8+
9+
## Frontend
10+
11+
### Next.js
12+
13+
Our frontend web app is built with the Next.js framework, which is a React-based framework with additional, opinionated solutions for how to accomplish
14+
common web application tasks including routing, data fetching, optimizations for serving media files, and a range of other features. The best place to learn more about Next.js is directly on their [docs](https://nextjs.org/docs).
15+
16+
### React
17+
18+
React is the frontend, Javascript library that Next.js itself is built around. It is built around the philosophy of separating portions of a web application's visual design and functionality into separate chunks known as **components**, which can combined and reuse in modular ways across the application. You can read more about React on its [docs](https://react.dev/reference/react).
19+
20+
### Tailwind
21+
22+
We use Tailwind as our CSS framework, which is an increasing popular for styling in modern web applications. You can read more on their [docs](https://v3.tailwindcss.com/docs/installation). Clean & Green Philly currently uses Tailwind v3 as opposed to the most recently released v4, which includes some major design differences, so be sure you are referencing the correct portion of the docs.
23+
24+
### Eslint and Prettier
25+
26+
For linting and formatting, we use the standard solutions of ESLint and Prettier.
27+
28+
ESLint - https://eslint.org/docs/latest/
29+
30+
Prettier - https://prettier.io/docs/
31+
32+
Because the overlap in roles for linting and formatters can be quite close, be sure to reference the integration docs about the interaction between the two technologies [here](https://prettier.io/docs/integrating-with-linters).
33+
34+
### Maplibre
35+
36+
The primary functionality of our web application is the interactive map to view vacant properties across Philadelphia. We achieve that using the Maplibre GL library, which is an open source Typescript library for rendering maps based on vector tile information in the browser environment. We supply the library with the primary feature information for visualization from our data pipeline and Maplibre GL creates an interactive map with zoom, panning, selection, and other capabilities.
37+
38+
There are two documentation sources to note - the original library which includes all of the original Typescript classes and objects for interacting with the map and `react-map-gl`, a library of React component that wrap that functionality and allow them to be included in React-based applications with declarative style.
39+
40+
Maplibre GL - https://maplibre.org/maplibre-gl-js/docs/
41+
42+
React Maplibre - https://visgl.github.io/react-map-gl/docs/api-reference/maplibre/map
43+
44+
## ETL Pipeline
45+
46+
### Python
47+
48+
All our data sourcing and pipeline code is written in Python. It's likely already installed on any particular contributors computer, but you can find documentation for installation and other resources on the main site [here](https://www.python.org/doc/). Keep in mind we use specific Python versioning within the project that you will need to adhere to using other tooling mentioned later in this document and the setup details.
49+
50+
You can find a lightweight refresher of basic syntax and concetps across the language at [w3schools](https://www.w3schools.com/python/) or more in-depth and comprehensive discussion on the main [docs](https://docs.python.org/3/).
51+
52+
### Geopandas
53+
54+
As we are working with geospatial data and sourcing and transforming it in various ways, we use the main Python-based, open source library for working with such data - `geopandas`. It is an extension on the popular `pandas` framework that allwos you to organize data into flexible `DataFrame` objects for easy manipulation and transformation as well as adding additional support for geographic objects in the `GeoSeries` and `GeoDataFrame` types. The documentation is available [here](https://geopandas.org/en/stable/docs.html).
55+
56+
### File Formatting - Geoparquet, PMTiles, tippiecanoe
57+
58+
We use a variety of file types for storage of the end dataset constructed by the ETL pipeline. The most common format for encoding geospatial data is GeoJSON, which stores the geometry and other information in a familiar JSON format. However, because this is inefficient for loading and fetching client-side in our web application, we convert to several other file formats for better performance.
59+
60+
The first are PMtiles, which is a single-file vector tile format for geospatial data. It's main benefit is that it, as a single file, encompasses all vector tiles are each zoom level for our data, and so it can be hosted in our Google Cloud Bucket and simply fetched from by Maplibre rather than needing a more complex backend or querying solution as an intermediary. More information can be found [here](https://docs.protomaps.com/pmtiles/).
61+
62+
The second are GeoParquet files, which are a geospatial addition to Apache Parquet files. Their benefit is they are a highly performant way to compress and store columnar data, and they have useful mechanisms for self-describing and partitioning the data within the file itself for faster retrieval and querying in the future. We are in the process of shifting the majority of our storage into this format and phasing out the Postgres-based service we have previously been using, which is mentioned below. You can find out more [here](https://geoparquet.org/releases/v0.2.0/).
63+
64+
### Python version and dependency management - Pipenv - Pyenv
65+
66+
We manage our Python dependencies using `pipenv` which mantains them through the `Pipfile` and `Pipfile.lock`. Our Python versioning is managed through Docker when running the main script, but if you were interested in doing local development, you can also pin your python version to 3.11.4 using `pyenv`.
67+
68+
`pipenv` - https://pipenv.pypa.io/en/latest/
69+
70+
`pyenv` - https://github.com/pyenv/pyenv
71+
72+
### Vulture
73+
74+
Vulture is a code checking utility used for identifying dead and unused Python code within our repo. It is included as a check in our precommit hook for PRs. The Github repo for the tool can be found [here](https://github.com/jendrikseipp/vulture).
75+
76+
###
77+
78+
## Devops - CI/CD - Infrastructure
79+
80+
### Docker - Docker Compose
81+
82+
Docker is a container tool that is used for creating an isolated and controllable environment to run our code that is replicable across different machines. Our ETL pipeline runs inside what is referred to as a **container** in Docker, which manages the versioning and dependencies of everything we use in our repo as well as ensuring it interfaces with a standardized operating system within the container's virtualized environment.
83+
84+
You can read more about in the Docker docs [here](https://docs.docker.com/), and the easiest way to get setup with using Docker is Docker Desktop with installation instructions [here](https://docs.docker.com/desktop/).
85+
86+
### Google Cloud
87+
88+
All of the data produced by our pipeline needs to be stored somewhere for access from our web application, and we do so with a Google Cloud bucket that holds the geospatial data as well as street view images to be served up request from the site. In addition, we have a Google Cloud Compute Engine (VM) setup to run our pipeline on a regularly scheduled interval to update our data.
89+
90+
Google Cloud is a broad platform and subject with a lot of details, but those interested in learning more or taking on any infrastructure tasks that crop up in the future can learn more on their [docs](https://cloud.google.com/docs).
91+
92+
### Github Actions
93+
94+
For our CI/CD, we use Github Action for a number of workflows to manage incoming pull requests, labelling issues and assigning reviews, and generating automated releases. Some of those workflows need additional correction and revision in their current form, which we are getting to, but they can be found in the `.github` directory of the repo.
95+
96+
Documentation for possible tasks accomplished by Github actions and writing them is [here](https://docs.github.com/en/actions).
97+
98+
### Other Useful Links
99+
100+
https://advena.hashnode.dev/heavy-map-visualizations-fundamentals-for-web-developers
101+
102+
Useful blog discussing approaches to architectural approaches and serving geospatial data to frontend applications.

0 commit comments

Comments
 (0)