Skip to content

Commit 6215dfb

Browse files
csmartinez22csmartinez22
authored andcommitted
initial projects page
1 parent ba567b6 commit 6215dfb

File tree

2 files changed

+146
-1
lines changed

2 files changed

+146
-1
lines changed

my-app/src/pages/Projects.css

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
.header {
2+
3+
}
4+
5+
.text {
6+
color: rgb(154, 157, 158);
7+
}
8+
9+
.projects {
10+
display: flex;
11+
flex-direction: column;
12+
justify-content: center;
13+
align-items: center;
14+
}

my-app/src/pages/Projects.js

Lines changed: 132 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,136 @@
1+
import './Projects.css'
2+
13
const Projects = () => {
2-
return <h1>Personal Projects</h1>;
4+
return (
5+
<div className="projects">
6+
<h1 className="title">
7+
Projects
8+
</h1>
9+
<h1 className="title">
10+
Puerto Rico
11+
</h1>
12+
<div className="body">
13+
Below is an informal write where I used machine learning to identify which municipalities in Puerto Rico are at risk of gentrification.
14+
</div>
15+
<h2 className="header">
16+
Preface
17+
</h2>
18+
<div className="body">
19+
I chose Puerto Rico as my focus for my project because mainland Americans often forget it.
20+
In a 2017 study, a poll conducted by Morning Consult found that 46% of Americans did not know
21+
that Puerto Ricans are considered U.S. citizens. I knew I wanted to work on gentrification as
22+
someone who has lived in public housing for most of my life and saw the city of Somerville
23+
destroy the public housing complex for mixed-income housing. I searched for existing projects
24+
that focused on identifying gentrification in the U.S., and every project I found
25+
concentrated on mainland U.S. major cities, excluding Puerto Rico. By combining both topics,
26+
I aim to raise awareness of displaced Puerto Ricans, create a tool that lawmakers can use
27+
to see which communities are most vulnerable to being displaced, and encourage the
28+
artificial intelligence community to look at tackling more diverse issues and datasets.
29+
In addition to identifying the problem, I think it’s crucial to understand how
30+
Puerto Rico became at risk of gentrification to address it. Under former Puerto Rican
31+
governor Luis Fortuño, two controversial acts, Act 22 and Act 60, were passed that
32+
encouraged wealthy foreigners to move to Puerto Rico through tax incentives.
33+
The government has been unsuccessful in ensuring new residents meet the
34+
requirements in exchange for tax incentives, allowing foreigners to take
35+
advantage of the two acts.
36+
</div>
37+
<h2 className="header">
38+
Methodology
39+
</h2>
40+
<div className="body">
41+
I went through the available tables on the United States Census Bureau’s website to look for
42+
data tables under Puerto Rico that I thought would be relevant to finding which municipalities
43+
are at risk of or already gentrified. In total, I compiled nine tables from the census website
44+
with data from 2021, including demographic and housing estimates, age and sex, geographic
45+
mobility by selected characteristics, households and families, educational attainment, mean
46+
income, demographic characteristics of occupied housing units, financial characteristics,
47+
physical housing characteristics, and race. After compiling and cleaning the data, I
48+
considered the best approach to the problem. I initially viewed unsupervised learning as
49+
the best way to approach the problem since I didn’t have access to any labeled data. I
50+
chose the K Means Clustering model and attempted to train it but found that the clusters
51+
the model created seemed arbitrary and didn’t indicate any pattern. After researching
52+
K Means Clustering, I discovered that the algorithm struggles with high-dimensional data,
53+
making it unideal for my dataset.
54+
</div>
55+
<div className="body">
56+
To make K Means Clustering work, I reduced the number of tables used to two and found the model
57+
was picking up on a pattern. I looked into the categories the model created and noticed it was
58+
grouping the municipalities with major cities together. It grouped San Juan, Ponce, Caguas, and
59+
Bayamon. To verify these results, I looked at articles to find some municipalities that are
60+
currently dealing with gentrification and found that my model misclassified many municipalities
61+
dealing with gentrification as not at risk. After experimenting with the parameters and getting
62+
the same results back, I decided to experiment with the algorithm. I went with the Spectral
63+
Clustering algorithm, which often outperforms K Means Clustering. After training the model on
64+
both the complete and subsets of the dataset, I found similar results to K Means Clusters,
65+
where it would group all the municipalities with major cities.
66+
</div>
67+
<div className="body">
68+
After being unsuccessful with two unsupervised learning algorithms, I assumed my only other option
69+
would be to use a supervised learning algorithm. I researched which algorithm would work best if
70+
given a small amount of labeled data. I came across the idea of semi-supervised learning, a
71+
middle ground between unsupervised and supervised. Semi-supervised learning, also known as weak
72+
supervision, can learn using large amounts of unlabeled data paired with labeled data.
73+
Semi-supervised learning circumvents some common issues associated with getting a dataset for
74+
supervised learning, such as being expensive and time-consuming. The approach requires less
75+
human oversight while still benefiting from a higher accuracy.
76+
</div>
77+
<div className="body">
78+
To create a small amount of labeled data, I searched for articles and personal posts about
79+
gentrifying parts of Puerto Rico. I found posts on Reddit of users describing the gentrification
80+
they’re witnessing while also categorizing the municipalities people moved to after being
81+
displaced as less likely to be gentrified. After labeling a small subset of data, I looked at
82+
the available unsupervised learning algorithms. I decided to use the SelfTrainingClassifier
83+
from Scikit Learn, which is based on Yarowsky’s algorithm and allows for a supervised classifier
84+
to work like a semi-supervised classifier. According to Scikit Learn, “It does this by
85+
iteratively predicting pseudo-labels for the unlabeled data and adding them to the training set.” For my base supervised learning algorithm to pair with the SelfTrainingClassifer, I used the Support Vector Classification (SVC) algorithm.
86+
</div>
87+
<h2 className="header">
88+
Results
89+
</h2>
90+
<div className="body">
91+
After training the algorithm and creating a model to predict all municipalities in Puerto Rico, it
92+
found that out of the 77 municipalities in Puerto Rico, all but 11 are classified as gentrifying
93+
or at risk of gentrification. The following municipalities are considered low risk for
94+
gentrification:
95+
</div>
96+
<ul>
97+
<li>Bayamón </li>
98+
<li>Caguas </li>
99+
<li>Canóvanas </li>
100+
<li>Carolina </li>
101+
<li>Guaynabo </li>
102+
<li>Gurabo </li>
103+
<li>Juncos </li>
104+
<li>Santa Isabel</li>
105+
<li>Toa Alta</li>
106+
<li>Toa Baja</li>
107+
<li>Trujillo Alto</li>
108+
</ul>
109+
<div className="body">
110+
Below is a map of the results of the model’s prediction for every municipality.
111+
</div>
112+
<h2 className="header">
113+
Future Research
114+
</h2>
115+
<div className="body">
116+
To improve my project, I’d like to ensure the initial labels I used for the SelfTrainingClassifier
117+
are as accurate as possible. For my initial labels, I researched personal stories of individuals
118+
facing gentrification in their neighborhoods. Although these stories are helpful, there are
119+
plenty of other Puerto Ricans whose stories may not have appeared in my research or had the
120+
opportunity to share their story. To ensure my labeled data is more accurate, I’d like to work
121+
with a sociologist or nonprofit that focuses on gentrification in Puerto Rico to verify my
122+
initial labels and results. I’d also be interested in conducting interviews with residents of
123+
affected municipalities or conducting a survey to see how accurate my model is.
124+
</div>
125+
<div className="body">
126+
Another way I could improve my project would be by combining the data I collected from the U.S. Census
127+
with housing data from Zillow or AirBnB. One of the limitations of using census data is that it
128+
isn’t up to date with current events. The data I collected is from 2021, so the status of
129+
gentrification in Puerto Rico could look different now compared to back then. Using current
130+
housing data could circumvent that issue.
131+
</div>
132+
</div>
133+
);
3134
};
4135

5136
export default Projects;

0 commit comments

Comments
 (0)