|
| 1 | +import './Projects.css' |
| 2 | + |
1 | 3 | const Projects = () => {
|
2 |
| - return <h1>Personal Projects</h1>; |
| 4 | + return ( |
| 5 | + <div className="projects"> |
| 6 | + <h1 className="title"> |
| 7 | + Projects |
| 8 | + </h1> |
| 9 | + <h1 className="title"> |
| 10 | + Puerto Rico |
| 11 | + </h1> |
| 12 | + <div className="body"> |
| 13 | + Below is an informal write where I used machine learning to identify which municipalities in Puerto Rico are at risk of gentrification. |
| 14 | + </div> |
| 15 | + <h2 className="header"> |
| 16 | + Preface |
| 17 | + </h2> |
| 18 | + <div className="body"> |
| 19 | + I chose Puerto Rico as my focus for my project because mainland Americans often forget it. |
| 20 | + In a 2017 study, a poll conducted by Morning Consult found that 46% of Americans did not know |
| 21 | + that Puerto Ricans are considered U.S. citizens. I knew I wanted to work on gentrification as |
| 22 | + someone who has lived in public housing for most of my life and saw the city of Somerville |
| 23 | + destroy the public housing complex for mixed-income housing. I searched for existing projects |
| 24 | + that focused on identifying gentrification in the U.S., and every project I found |
| 25 | + concentrated on mainland U.S. major cities, excluding Puerto Rico. By combining both topics, |
| 26 | + I aim to raise awareness of displaced Puerto Ricans, create a tool that lawmakers can use |
| 27 | + to see which communities are most vulnerable to being displaced, and encourage the |
| 28 | + artificial intelligence community to look at tackling more diverse issues and datasets. |
| 29 | + In addition to identifying the problem, I think it’s crucial to understand how |
| 30 | + Puerto Rico became at risk of gentrification to address it. Under former Puerto Rican |
| 31 | + governor Luis Fortuño, two controversial acts, Act 22 and Act 60, were passed that |
| 32 | + encouraged wealthy foreigners to move to Puerto Rico through tax incentives. |
| 33 | + The government has been unsuccessful in ensuring new residents meet the |
| 34 | + requirements in exchange for tax incentives, allowing foreigners to take |
| 35 | + advantage of the two acts. |
| 36 | + </div> |
| 37 | + <h2 className="header"> |
| 38 | + Methodology |
| 39 | + </h2> |
| 40 | + <div className="body"> |
| 41 | + I went through the available tables on the United States Census Bureau’s website to look for |
| 42 | + data tables under Puerto Rico that I thought would be relevant to finding which municipalities |
| 43 | + are at risk of or already gentrified. In total, I compiled nine tables from the census website |
| 44 | + with data from 2021, including demographic and housing estimates, age and sex, geographic |
| 45 | + mobility by selected characteristics, households and families, educational attainment, mean |
| 46 | + income, demographic characteristics of occupied housing units, financial characteristics, |
| 47 | + physical housing characteristics, and race. After compiling and cleaning the data, I |
| 48 | + considered the best approach to the problem. I initially viewed unsupervised learning as |
| 49 | + the best way to approach the problem since I didn’t have access to any labeled data. I |
| 50 | + chose the K Means Clustering model and attempted to train it but found that the clusters |
| 51 | + the model created seemed arbitrary and didn’t indicate any pattern. After researching |
| 52 | + K Means Clustering, I discovered that the algorithm struggles with high-dimensional data, |
| 53 | + making it unideal for my dataset. |
| 54 | + </div> |
| 55 | + <div className="body"> |
| 56 | + To make K Means Clustering work, I reduced the number of tables used to two and found the model |
| 57 | + was picking up on a pattern. I looked into the categories the model created and noticed it was |
| 58 | + grouping the municipalities with major cities together. It grouped San Juan, Ponce, Caguas, and |
| 59 | + Bayamon. To verify these results, I looked at articles to find some municipalities that are |
| 60 | + currently dealing with gentrification and found that my model misclassified many municipalities |
| 61 | + dealing with gentrification as not at risk. After experimenting with the parameters and getting |
| 62 | + the same results back, I decided to experiment with the algorithm. I went with the Spectral |
| 63 | + Clustering algorithm, which often outperforms K Means Clustering. After training the model on |
| 64 | + both the complete and subsets of the dataset, I found similar results to K Means Clusters, |
| 65 | + where it would group all the municipalities with major cities. |
| 66 | + </div> |
| 67 | + <div className="body"> |
| 68 | + After being unsuccessful with two unsupervised learning algorithms, I assumed my only other option |
| 69 | + would be to use a supervised learning algorithm. I researched which algorithm would work best if |
| 70 | + given a small amount of labeled data. I came across the idea of semi-supervised learning, a |
| 71 | + middle ground between unsupervised and supervised. Semi-supervised learning, also known as weak |
| 72 | + supervision, can learn using large amounts of unlabeled data paired with labeled data. |
| 73 | + Semi-supervised learning circumvents some common issues associated with getting a dataset for |
| 74 | + supervised learning, such as being expensive and time-consuming. The approach requires less |
| 75 | + human oversight while still benefiting from a higher accuracy. |
| 76 | + </div> |
| 77 | + <div className="body"> |
| 78 | + To create a small amount of labeled data, I searched for articles and personal posts about |
| 79 | + gentrifying parts of Puerto Rico. I found posts on Reddit of users describing the gentrification |
| 80 | + they’re witnessing while also categorizing the municipalities people moved to after being |
| 81 | + displaced as less likely to be gentrified. After labeling a small subset of data, I looked at |
| 82 | + the available unsupervised learning algorithms. I decided to use the SelfTrainingClassifier |
| 83 | + from Scikit Learn, which is based on Yarowsky’s algorithm and allows for a supervised classifier |
| 84 | + to work like a semi-supervised classifier. According to Scikit Learn, “It does this by |
| 85 | + iteratively predicting pseudo-labels for the unlabeled data and adding them to the training set.” For my base supervised learning algorithm to pair with the SelfTrainingClassifer, I used the Support Vector Classification (SVC) algorithm. |
| 86 | + </div> |
| 87 | + <h2 className="header"> |
| 88 | + Results |
| 89 | + </h2> |
| 90 | + <div className="body"> |
| 91 | + After training the algorithm and creating a model to predict all municipalities in Puerto Rico, it |
| 92 | + found that out of the 77 municipalities in Puerto Rico, all but 11 are classified as gentrifying |
| 93 | + or at risk of gentrification. The following municipalities are considered low risk for |
| 94 | + gentrification: |
| 95 | + </div> |
| 96 | + <ul> |
| 97 | + <li>Bayamón </li> |
| 98 | + <li>Caguas </li> |
| 99 | + <li>Canóvanas </li> |
| 100 | + <li>Carolina </li> |
| 101 | + <li>Guaynabo </li> |
| 102 | + <li>Gurabo </li> |
| 103 | + <li>Juncos </li> |
| 104 | + <li>Santa Isabel</li> |
| 105 | + <li>Toa Alta</li> |
| 106 | + <li>Toa Baja</li> |
| 107 | + <li>Trujillo Alto</li> |
| 108 | + </ul> |
| 109 | + <div className="body"> |
| 110 | + Below is a map of the results of the model’s prediction for every municipality. |
| 111 | + </div> |
| 112 | + <h2 className="header"> |
| 113 | + Future Research |
| 114 | + </h2> |
| 115 | + <div className="body"> |
| 116 | + To improve my project, I’d like to ensure the initial labels I used for the SelfTrainingClassifier |
| 117 | + are as accurate as possible. For my initial labels, I researched personal stories of individuals |
| 118 | + facing gentrification in their neighborhoods. Although these stories are helpful, there are |
| 119 | + plenty of other Puerto Ricans whose stories may not have appeared in my research or had the |
| 120 | + opportunity to share their story. To ensure my labeled data is more accurate, I’d like to work |
| 121 | + with a sociologist or nonprofit that focuses on gentrification in Puerto Rico to verify my |
| 122 | + initial labels and results. I’d also be interested in conducting interviews with residents of |
| 123 | + affected municipalities or conducting a survey to see how accurate my model is. |
| 124 | + </div> |
| 125 | + <div className="body"> |
| 126 | + Another way I could improve my project would be by combining the data I collected from the U.S. Census |
| 127 | + with housing data from Zillow or AirBnB. One of the limitations of using census data is that it |
| 128 | + isn’t up to date with current events. The data I collected is from 2021, so the status of |
| 129 | + gentrification in Puerto Rico could look different now compared to back then. Using current |
| 130 | + housing data could circumvent that issue. |
| 131 | + </div> |
| 132 | + </div> |
| 133 | + ); |
3 | 134 | };
|
4 | 135 |
|
5 | 136 | export default Projects;
|
0 commit comments