You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Below is an informal write where I used machine learning to identify which municipalities in Puerto Rico are at risk of gentrification.
29
+
Below is an informal write up where I used machine learning to identify
30
+
which municipalities in Puerto Rico are at risk of gentrification.
14
31
</div>
15
-
<h2className="header">
16
-
Preface
17
-
</h2>
32
+
<h2className="header">Preface</h2>
18
33
<divclassName="body">
19
-
I chose Puerto Rico as my focus for my project because mainland Americans often forget it.
20
-
In a 2017 study, a poll conducted by Morning Consult found that 46% of Americans did not know
21
-
that Puerto Ricans are considered U.S. citizens. I knew I wanted to work on gentrification as
22
-
someone who has lived in public housing for most of my life and saw the city of Somerville
23
-
destroy the public housing complex for mixed-income housing. I searched for existing projects
24
-
that focused on identifying gentrification in the U.S., and every project I found
25
-
concentrated on mainland U.S. major cities, excluding Puerto Rico. By combining both topics,
26
-
I aim to raise awareness of displaced Puerto Ricans, create a tool that lawmakers can use
27
-
to see which communities are most vulnerable to being displaced, and encourage the
28
-
artificial intelligence community to look at tackling more diverse issues and datasets.
29
-
In addition to identifying the problem, I think it’s crucial to understand how
30
-
Puerto Rico became at risk of gentrification to address it. Under former Puerto Rican
31
-
governor Luis Fortuño, two controversial acts, Act 22 and Act 60, were passed that
32
-
encouraged wealthy foreigners to move to Puerto Rico through tax incentives.
33
-
The government has been unsuccessful in ensuring new residents meet the
34
-
requirements in exchange for tax incentives, allowing foreigners to take
35
-
advantage of the two acts.
34
+
I chose Puerto Rico as my focus for my project because mainland
35
+
Americans often forget it. In a 2017 study, a poll conducted by Morning
36
+
Consult found that 46% of Americans did not know that Puerto Ricans are
37
+
considered U.S. citizens. I knew I wanted to work on gentrification as
38
+
someone who has lived in public housing for most of my life and saw the
39
+
city of Somerville destroy the public housing complex for mixed-income
40
+
housing. I searched for existing projects that focused on identifying
41
+
gentrification in the U.S., and every project I found concentrated on
42
+
mainland U.S. major cities, excluding Puerto Rico. By combining both
43
+
topics, I aim to raise awareness of displaced Puerto Ricans, create a
44
+
tool that lawmakers can use to see which communities are most vulnerable
45
+
to being displaced, and encourage the artificial intelligence community
46
+
to look at tackling more diverse issues and datasets. In addition to
47
+
identifying the problem, I think it’s crucial to understand how Puerto
48
+
Rico became at risk of gentrification to address it. Under former Puerto
49
+
Rican governor Luis Fortuño, two controversial acts, Act 22 and Act 60,
50
+
were passed that encouraged wealthy foreigners to move to Puerto Rico
51
+
through tax incentives. The government has been unsuccessful in ensuring
52
+
new residents meet the requirements in exchange for tax incentives,
53
+
allowing foreigners to take advantage of the two acts.
36
54
</div>
37
-
<h2className="header">
38
-
Methodology
39
-
</h2>
55
+
<h2className="header">Methodology</h2>
40
56
<divclassName="body">
41
-
I went through the available tables on the United States Census Bureau’s website to look for
42
-
data tables under Puerto Rico that I thought would be relevant to finding which municipalities
43
-
are at risk of or already gentrified. In total, I compiled nine tables from the census website
44
-
with data from 2021, including demographic and housing estimates, age and sex, geographic
45
-
mobility by selected characteristics, households and families, educational attainment, mean
46
-
income, demographic characteristics of occupied housing units, financial characteristics,
47
-
physical housing characteristics, and race. After compiling and cleaning the data, I
48
-
considered the best approach to the problem. I initially viewed unsupervised learning as
49
-
the best way to approach the problem since I didn’t have access to any labeled data. I
50
-
chose the K Means Clustering model and attempted to train it but found that the clusters
51
-
the model created seemed arbitrary and didn’t indicate any pattern. After researching
52
-
K Means Clustering, I discovered that the algorithm struggles with high-dimensional data,
53
-
making it unideal for my dataset.
57
+
I went through the available tables on the United States Census Bureau’s
58
+
website to look for data tables under Puerto Rico that I thought would
59
+
be relevant to finding which municipalities are at risk of or already
60
+
gentrified. In total, I compiled nine tables from the census website
61
+
with data from 2021, including demographic and housing estimates, age
62
+
and sex, geographic mobility by selected characteristics, households and
63
+
families, educational attainment, mean income, demographic
64
+
characteristics of occupied housing units, financial characteristics,
65
+
physical housing characteristics, and race. After compiling and cleaning
66
+
the data, I considered the best approach to the problem. I initially
67
+
viewed unsupervised learning as the best way to approach the problem
68
+
since I didn’t have access to any labeled data. I chose the K Means
69
+
Clustering model and attempted to train it but found that the clusters
70
+
the model created seemed arbitrary and didn’t indicate any pattern.
71
+
After researching K Means Clustering, I discovered that the algorithm
72
+
struggles with high-dimensional data, making it unideal for my dataset.
54
73
</div>
55
74
<divclassName="body">
56
-
To make K Means Clustering work, I reduced the number of tables used to two and found the model
57
-
was picking up on a pattern. I looked into the categories the model created and noticed it was
58
-
grouping the municipalities with major cities together. It grouped San Juan, Ponce, Caguas, and
59
-
Bayamon. To verify these results, I looked at articles to find some municipalities that are
60
-
currently dealing with gentrification and found that my model misclassified many municipalities
61
-
dealing with gentrification as not at risk. After experimenting with the parameters and getting
62
-
the same results back, I decided to experiment with the algorithm. I went with the Spectral
63
-
Clustering algorithm, which often outperforms K Means Clustering. After training the model on
64
-
both the complete and subsets of the dataset, I found similar results to K Means Clusters,
65
-
where it would group all the municipalities with major cities.
75
+
To make K Means Clustering work, I reduced the number of tables used to
76
+
two and found the model was picking up on a pattern. I looked into the
77
+
categories the model created and noticed it was grouping the
78
+
municipalities with major cities together. It grouped San Juan, Ponce,
79
+
Caguas, and Bayamon. To verify these results, I looked at articles to
80
+
find some municipalities that are currently dealing with gentrification
81
+
and found that my model misclassified many municipalities dealing with
82
+
gentrification as not at risk. After experimenting with the parameters
83
+
and getting the same results back, I decided to experiment with the
84
+
algorithm. I went with the Spectral Clustering algorithm, which often
85
+
outperforms K Means Clustering. After training the model on both the
86
+
complete and subsets of the dataset, I found similar results to K Means
87
+
Clusters, where it would group all the municipalities with major cities.
66
88
</div>
67
89
<divclassName="body">
68
-
After being unsuccessful with two unsupervised learning algorithms, I assumed my only other option
69
-
would be to use a supervised learning algorithm. I researched which algorithm would work best if
70
-
given a small amount of labeled data. I came across the idea of semi-supervised learning, a
71
-
middle ground between unsupervised and supervised. Semi-supervised learning, also known as weak
72
-
supervision, can learn using large amounts of unlabeled data paired with labeled data.
73
-
Semi-supervised learning circumvents some common issues associated with getting a dataset for
74
-
supervised learning, such as being expensive and time-consuming. The approach requires less
75
-
human oversight while still benefiting from a higher accuracy.
90
+
After being unsuccessful with two unsupervised learning algorithms, I
91
+
assumed my only other option would be to use a supervised learning
92
+
algorithm. I researched which algorithm would work best if given a small
93
+
amount of labeled data. I came across the idea of semi-supervised
94
+
learning, a middle ground between unsupervised and supervised.
95
+
Semi-supervised learning, also known as weak supervision, can learn
96
+
using large amounts of unlabeled data paired with labeled data.
97
+
Semi-supervised learning circumvents some common issues associated with
98
+
getting a dataset for supervised learning, such as being expensive and
99
+
time-consuming. The approach requires less human oversight while still
100
+
benefiting from a higher accuracy.
76
101
</div>
77
102
<divclassName="body">
78
-
To create a small amount of labeled data, I searched for articles and personal posts about
79
-
gentrifying parts of Puerto Rico. I found posts on Reddit of users describing the gentrification
80
-
they’re witnessing while also categorizing the municipalities people moved to after being
81
-
displaced as less likely to be gentrified. After labeling a small subset of data, I looked at
82
-
the available unsupervised learning algorithms. I decided to use the SelfTrainingClassifier
83
-
from Scikit Learn, which is based on Yarowsky’s algorithm and allows for a supervised classifier
84
-
to work like a semi-supervised classifier. According to Scikit Learn, “It does this by
85
-
iteratively predicting pseudo-labels for the unlabeled data and adding them to the training set.” For my base supervised learning algorithm to pair with the SelfTrainingClassifer, I used the Support Vector Classification (SVC) algorithm.
103
+
To create a small amount of labeled data, I searched for articles and
104
+
personal posts about gentrifying parts of Puerto Rico. I found posts on
105
+
Reddit of users describing the gentrification they’re witnessing while
106
+
also categorizing the municipalities people moved to after being
107
+
displaced as less likely to be gentrified. After labeling a small subset
108
+
of data, I looked at the available unsupervised learning algorithms. I
109
+
decided to use the SelfTrainingClassifier from Scikit Learn, which is
110
+
based on Yarowsky’s algorithm and allows for a supervised classifier to
111
+
work like a semi-supervised classifier. According to Scikit Learn, “It
112
+
does this by iteratively predicting pseudo-labels for the unlabeled data
113
+
and adding them to the training set.” For my base supervised learning
114
+
algorithm to pair with the SelfTrainingClassifer, I used the Support
115
+
Vector Classification (SVC) algorithm.
86
116
</div>
87
-
<h2className="header">
88
-
Results
89
-
</h2>
117
+
<h2className="header">Results</h2>
90
118
<divclassName="body">
91
-
After training the algorithm and creating a model to predict all municipalities in Puerto Rico, it
92
-
found that out of the 77 municipalities in Puerto Rico, all but 11 are classified as gentrifying
93
-
or at risk of gentrification. The following municipalities are considered low risk for
94
-
gentrification:
119
+
After training the algorithm and creating a model to predict all
120
+
municipalities in Puerto Rico, it found that out of the 77
121
+
municipalities in Puerto Rico, all but 11 are classified as gentrifying
122
+
or at risk of gentrification. The following municipalities are
123
+
considered low risk for gentrification:
95
124
</div>
96
125
<ul>
97
126
<li>Bayamón </li>
@@ -107,27 +136,71 @@ const Projects = () => {
107
136
<li>Trujillo Alto</li>
108
137
</ul>
109
138
<divclassName="body">
110
-
Below is a map of the results of the model’s prediction for every municipality.
139
+
Below is a map of the results of the model’s prediction for every
0 commit comments