Skip to content

Commit aaba947

Browse files
committed
adding map
1 parent 6215dfb commit aaba947

File tree

4 files changed

+12262
-106
lines changed

4 files changed

+12262
-106
lines changed

my-app/src/assets/puertorico.json

Lines changed: 12043 additions & 2 deletions
Large diffs are not rendered by default.

my-app/src/pages/Layout.css

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,5 @@ a {
1313
width: 70px;
1414
height: 40px;
1515
background-size: cover;
16+
margin-left: 20px;
1617
}

my-app/src/pages/Projects.css

Lines changed: 50 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,55 @@
1-
.header {
1+
.projects {
2+
display: flex;
3+
flex-direction: column;
4+
justify-content: center;
5+
align-items: center;
6+
color: rgb(154, 157, 158);
7+
padding: 20px;
28

3-
}
9+
.title {
10+
padding-bottom: 20px;
11+
}
412

5-
.text {
6-
color: rgb(154, 157, 158);
7-
}
13+
.body {
14+
padding: 20px;
15+
}
16+
.header {
17+
font-weight: bold;
18+
}
19+
.map {
20+
width: 75vw;
21+
background-color: rgb(156, 192, 249);
22+
position: relative;
23+
margin-bottom: 20px;
24+
}
25+
.legend {
26+
position: absolute;
27+
bottom: 0;
28+
right: 0;
29+
background: white;
30+
color: black;
31+
padding: 10px;
32+
}
833

9-
.projects {
34+
.warning {
35+
width: 15px;
36+
height: 15px;
37+
background-color: red;
38+
padding-right: 0;
39+
}
40+
41+
.safe {
42+
width: 15px;
43+
height: 15px;
44+
background-color: green;
45+
padding-right: 0;
46+
}
47+
48+
.row {
1049
display: flex;
11-
flex-direction: column;
12-
justify-content: center;
50+
justify-content: space-between;
51+
padding-left: 20px;
52+
padding-right: 20px;
1353
align-items: center;
14-
}
54+
}
55+
}

my-app/src/pages/Projects.js

Lines changed: 168 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -1,97 +1,126 @@
1-
import './Projects.css'
1+
import { ComposableMap, Geographies, Geography } from "react-simple-maps";
2+
3+
import JSON from "../assets/puertorico.json";
4+
import "./Projects.css";
5+
6+
const width = 800;
7+
const height = 600;
8+
9+
const notAtRisk = [
10+
"Bayamón",
11+
"Caguas",
12+
"Canóvanas",
13+
"Carolina",
14+
"Guaynabo",
15+
"Gurabo",
16+
"Juncos",
17+
"Santa Isabel",
18+
"Toa Alta",
19+
"Toa Baja",
20+
"Trujillo Alto",
21+
];
222

323
const Projects = () => {
424
return (
5-
<div className="projects">
6-
<h1 className="title">
7-
Projects
8-
</h1>
9-
<h1 className="title">
10-
Puerto Rico
11-
</h1>
25+
<div className="projects bg-dark">
26+
<h1 className="title">Projects</h1>
27+
<h1 className="title">Puerto Rico</h1>
1228
<div className="body">
13-
Below is an informal write where I used machine learning to identify which municipalities in Puerto Rico are at risk of gentrification.
29+
Below is an informal write up where I used machine learning to identify
30+
which municipalities in Puerto Rico are at risk of gentrification.
1431
</div>
15-
<h2 className="header">
16-
Preface
17-
</h2>
32+
<h2 className="header">Preface</h2>
1833
<div className="body">
19-
I chose Puerto Rico as my focus for my project because mainland Americans often forget it.
20-
In a 2017 study, a poll conducted by Morning Consult found that 46% of Americans did not know
21-
that Puerto Ricans are considered U.S. citizens. I knew I wanted to work on gentrification as
22-
someone who has lived in public housing for most of my life and saw the city of Somerville
23-
destroy the public housing complex for mixed-income housing. I searched for existing projects
24-
that focused on identifying gentrification in the U.S., and every project I found
25-
concentrated on mainland U.S. major cities, excluding Puerto Rico. By combining both topics,
26-
I aim to raise awareness of displaced Puerto Ricans, create a tool that lawmakers can use
27-
to see which communities are most vulnerable to being displaced, and encourage the
28-
artificial intelligence community to look at tackling more diverse issues and datasets.
29-
In addition to identifying the problem, I think it’s crucial to understand how
30-
Puerto Rico became at risk of gentrification to address it. Under former Puerto Rican
31-
governor Luis Fortuño, two controversial acts, Act 22 and Act 60, were passed that
32-
encouraged wealthy foreigners to move to Puerto Rico through tax incentives.
33-
The government has been unsuccessful in ensuring new residents meet the
34-
requirements in exchange for tax incentives, allowing foreigners to take
35-
advantage of the two acts.
34+
I chose Puerto Rico as my focus for my project because mainland
35+
Americans often forget it. In a 2017 study, a poll conducted by Morning
36+
Consult found that 46% of Americans did not know that Puerto Ricans are
37+
considered U.S. citizens. I knew I wanted to work on gentrification as
38+
someone who has lived in public housing for most of my life and saw the
39+
city of Somerville destroy the public housing complex for mixed-income
40+
housing. I searched for existing projects that focused on identifying
41+
gentrification in the U.S., and every project I found concentrated on
42+
mainland U.S. major cities, excluding Puerto Rico. By combining both
43+
topics, I aim to raise awareness of displaced Puerto Ricans, create a
44+
tool that lawmakers can use to see which communities are most vulnerable
45+
to being displaced, and encourage the artificial intelligence community
46+
to look at tackling more diverse issues and datasets. In addition to
47+
identifying the problem, I think it’s crucial to understand how Puerto
48+
Rico became at risk of gentrification to address it. Under former Puerto
49+
Rican governor Luis Fortuño, two controversial acts, Act 22 and Act 60,
50+
were passed that encouraged wealthy foreigners to move to Puerto Rico
51+
through tax incentives. The government has been unsuccessful in ensuring
52+
new residents meet the requirements in exchange for tax incentives,
53+
allowing foreigners to take advantage of the two acts.
3654
</div>
37-
<h2 className="header">
38-
Methodology
39-
</h2>
55+
<h2 className="header">Methodology</h2>
4056
<div className="body">
41-
I went through the available tables on the United States Census Bureau’s website to look for
42-
data tables under Puerto Rico that I thought would be relevant to finding which municipalities
43-
are at risk of or already gentrified. In total, I compiled nine tables from the census website
44-
with data from 2021, including demographic and housing estimates, age and sex, geographic
45-
mobility by selected characteristics, households and families, educational attainment, mean
46-
income, demographic characteristics of occupied housing units, financial characteristics,
47-
physical housing characteristics, and race. After compiling and cleaning the data, I
48-
considered the best approach to the problem. I initially viewed unsupervised learning as
49-
the best way to approach the problem since I didn’t have access to any labeled data. I
50-
chose the K Means Clustering model and attempted to train it but found that the clusters
51-
the model created seemed arbitrary and didn’t indicate any pattern. After researching
52-
K Means Clustering, I discovered that the algorithm struggles with high-dimensional data,
53-
making it unideal for my dataset.
57+
I went through the available tables on the United States Census Bureau’s
58+
website to look for data tables under Puerto Rico that I thought would
59+
be relevant to finding which municipalities are at risk of or already
60+
gentrified. In total, I compiled nine tables from the census website
61+
with data from 2021, including demographic and housing estimates, age
62+
and sex, geographic mobility by selected characteristics, households and
63+
families, educational attainment, mean income, demographic
64+
characteristics of occupied housing units, financial characteristics,
65+
physical housing characteristics, and race. After compiling and cleaning
66+
the data, I considered the best approach to the problem. I initially
67+
viewed unsupervised learning as the best way to approach the problem
68+
since I didn’t have access to any labeled data. I chose the K Means
69+
Clustering model and attempted to train it but found that the clusters
70+
the model created seemed arbitrary and didn’t indicate any pattern.
71+
After researching K Means Clustering, I discovered that the algorithm
72+
struggles with high-dimensional data, making it unideal for my dataset.
5473
</div>
5574
<div className="body">
56-
To make K Means Clustering work, I reduced the number of tables used to two and found the model
57-
was picking up on a pattern. I looked into the categories the model created and noticed it was
58-
grouping the municipalities with major cities together. It grouped San Juan, Ponce, Caguas, and
59-
Bayamon. To verify these results, I looked at articles to find some municipalities that are
60-
currently dealing with gentrification and found that my model misclassified many municipalities
61-
dealing with gentrification as not at risk. After experimenting with the parameters and getting
62-
the same results back, I decided to experiment with the algorithm. I went with the Spectral
63-
Clustering algorithm, which often outperforms K Means Clustering. After training the model on
64-
both the complete and subsets of the dataset, I found similar results to K Means Clusters,
65-
where it would group all the municipalities with major cities.
75+
To make K Means Clustering work, I reduced the number of tables used to
76+
two and found the model was picking up on a pattern. I looked into the
77+
categories the model created and noticed it was grouping the
78+
municipalities with major cities together. It grouped San Juan, Ponce,
79+
Caguas, and Bayamon. To verify these results, I looked at articles to
80+
find some municipalities that are currently dealing with gentrification
81+
and found that my model misclassified many municipalities dealing with
82+
gentrification as not at risk. After experimenting with the parameters
83+
and getting the same results back, I decided to experiment with the
84+
algorithm. I went with the Spectral Clustering algorithm, which often
85+
outperforms K Means Clustering. After training the model on both the
86+
complete and subsets of the dataset, I found similar results to K Means
87+
Clusters, where it would group all the municipalities with major cities.
6688
</div>
6789
<div className="body">
68-
After being unsuccessful with two unsupervised learning algorithms, I assumed my only other option
69-
would be to use a supervised learning algorithm. I researched which algorithm would work best if
70-
given a small amount of labeled data. I came across the idea of semi-supervised learning, a
71-
middle ground between unsupervised and supervised. Semi-supervised learning, also known as weak
72-
supervision, can learn using large amounts of unlabeled data paired with labeled data.
73-
Semi-supervised learning circumvents some common issues associated with getting a dataset for
74-
supervised learning, such as being expensive and time-consuming. The approach requires less
75-
human oversight while still benefiting from a higher accuracy.
90+
After being unsuccessful with two unsupervised learning algorithms, I
91+
assumed my only other option would be to use a supervised learning
92+
algorithm. I researched which algorithm would work best if given a small
93+
amount of labeled data. I came across the idea of semi-supervised
94+
learning, a middle ground between unsupervised and supervised.
95+
Semi-supervised learning, also known as weak supervision, can learn
96+
using large amounts of unlabeled data paired with labeled data.
97+
Semi-supervised learning circumvents some common issues associated with
98+
getting a dataset for supervised learning, such as being expensive and
99+
time-consuming. The approach requires less human oversight while still
100+
benefiting from a higher accuracy.
76101
</div>
77102
<div className="body">
78-
To create a small amount of labeled data, I searched for articles and personal posts about
79-
gentrifying parts of Puerto Rico. I found posts on Reddit of users describing the gentrification
80-
they’re witnessing while also categorizing the municipalities people moved to after being
81-
displaced as less likely to be gentrified. After labeling a small subset of data, I looked at
82-
the available unsupervised learning algorithms. I decided to use the SelfTrainingClassifier
83-
from Scikit Learn, which is based on Yarowsky’s algorithm and allows for a supervised classifier
84-
to work like a semi-supervised classifier. According to Scikit Learn, “It does this by
85-
iteratively predicting pseudo-labels for the unlabeled data and adding them to the training set.” For my base supervised learning algorithm to pair with the SelfTrainingClassifer, I used the Support Vector Classification (SVC) algorithm.
103+
To create a small amount of labeled data, I searched for articles and
104+
personal posts about gentrifying parts of Puerto Rico. I found posts on
105+
Reddit of users describing the gentrification they’re witnessing while
106+
also categorizing the municipalities people moved to after being
107+
displaced as less likely to be gentrified. After labeling a small subset
108+
of data, I looked at the available unsupervised learning algorithms. I
109+
decided to use the SelfTrainingClassifier from Scikit Learn, which is
110+
based on Yarowsky’s algorithm and allows for a supervised classifier to
111+
work like a semi-supervised classifier. According to Scikit Learn, “It
112+
does this by iteratively predicting pseudo-labels for the unlabeled data
113+
and adding them to the training set.” For my base supervised learning
114+
algorithm to pair with the SelfTrainingClassifer, I used the Support
115+
Vector Classification (SVC) algorithm.
86116
</div>
87-
<h2 className="header">
88-
Results
89-
</h2>
117+
<h2 className="header">Results</h2>
90118
<div className="body">
91-
After training the algorithm and creating a model to predict all municipalities in Puerto Rico, it
92-
found that out of the 77 municipalities in Puerto Rico, all but 11 are classified as gentrifying
93-
or at risk of gentrification. The following municipalities are considered low risk for
94-
gentrification:
119+
After training the algorithm and creating a model to predict all
120+
municipalities in Puerto Rico, it found that out of the 77
121+
municipalities in Puerto Rico, all but 11 are classified as gentrifying
122+
or at risk of gentrification. The following municipalities are
123+
considered low risk for gentrification:
95124
</div>
96125
<ul>
97126
<li>Bayamón </li>
@@ -107,27 +136,71 @@ const Projects = () => {
107136
<li>Trujillo Alto</li>
108137
</ul>
109138
<div className="body">
110-
Below is a map of the results of the model’s prediction for every municipality.
139+
Below is a map of the results of the model’s prediction for every
140+
municipality.
111141
</div>
112-
<h2 className="header">
113-
Future Research
114-
</h2>
142+
<div className="map">
143+
<ComposableMap
144+
width={width}
145+
height={height}
146+
projectionConfig={{
147+
scale: 16000,
148+
center: [-66.25, 18],
149+
}}
150+
style={{
151+
width: "100%",
152+
height: "auto",
153+
}}
154+
>
155+
<Geographies geography={JSON}>
156+
{({ geographies }) =>
157+
geographies.map((geo) => (
158+
<Geography
159+
key={geo.rsmKey}
160+
geography={geo}
161+
fill={
162+
notAtRisk.includes(geo.properties.NAME) ? "green" : "red"
163+
}
164+
strokeWidth="1"
165+
stroke="white"
166+
/>
167+
))
168+
}
169+
</Geographies>
170+
</ComposableMap>
171+
<div className="legend">
172+
<div className="row">
173+
<div className="warning"></div>
174+
At risk of gentrification
175+
</div>
176+
<div className="row">
177+
<div className="safe"></div>
178+
Currently not at risk of gentrification
179+
</div>
180+
</div>
181+
</div>
182+
183+
<h2 className="header">Future Research</h2>
115184
<div className="body">
116-
To improve my project, I’d like to ensure the initial labels I used for the SelfTrainingClassifier
117-
are as accurate as possible. For my initial labels, I researched personal stories of individuals
118-
facing gentrification in their neighborhoods. Although these stories are helpful, there are
119-
plenty of other Puerto Ricans whose stories may not have appeared in my research or had the
120-
opportunity to share their story. To ensure my labeled data is more accurate, I’d like to work
121-
with a sociologist or nonprofit that focuses on gentrification in Puerto Rico to verify my
122-
initial labels and results. I’d also be interested in conducting interviews with residents of
123-
affected municipalities or conducting a survey to see how accurate my model is.
185+
To improve my project, I’d like to ensure the initial labels I used for
186+
the SelfTrainingClassifier are as accurate as possible. For my initial
187+
labels, I researched personal stories of individuals facing
188+
gentrification in their neighborhoods. Although these stories are
189+
helpful, there are plenty of other Puerto Ricans whose stories may not
190+
have appeared in my research or had the opportunity to share their
191+
story. To ensure my labeled data is more accurate, I’d like to work with
192+
a sociologist or nonprofit that focuses on gentrification in Puerto Rico
193+
to verify my initial labels and results. I’d also be interested in
194+
conducting interviews with residents of affected municipalities or
195+
conducting a survey to see how accurate my model is.
124196
</div>
125197
<div className="body">
126-
Another way I could improve my project would be by combining the data I collected from the U.S. Census
127-
with housing data from Zillow or AirBnB. One of the limitations of using census data is that it
128-
isn’t up to date with current events. The data I collected is from 2021, so the status of
129-
gentrification in Puerto Rico could look different now compared to back then. Using current
130-
housing data could circumvent that issue.
198+
Another way I could improve my project would be by combining the data I
199+
collected from the U.S. Census with housing data from Zillow or AirBnB.
200+
One of the limitations of using census data is that it isn’t up to date
201+
with current events. The data I collected is from 2021, so the status of
202+
gentrification in Puerto Rico could look different now compared to back
203+
then. Using current housing data could circumvent that issue.
131204
</div>
132205
</div>
133206
);

0 commit comments

Comments
 (0)