-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcode.Rmd
More file actions
441 lines (316 loc) · 21.8 KB
/
code.Rmd
File metadata and controls
441 lines (316 loc) · 21.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
---
title: 'How to Manufacture Your Personality to Gain Friends and Second Dates: A Data-Driven
Field Guide for Pathetic People by Pathetic People'
output:
pdf_document: default
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Group members: Beau, Richard, Saad, Kevin
```{r}
library(data.table)
```
# Importing the data
(credit: Beau and Richard)
Speed Dating dataset: https://www.kaggle.com/annavictoria/speed-dating-experiment
Each row of the data represents one date between two people. The primary key is one of the participants in the date, so for each date there are two rows in the dataset.
We imported the data, and then turned the appropriate columns into factors. The missing data came as blank strings, which we encoded as NAs.
For now, we are filtering out rows which do not have complete data (some participants did not include their hometown or undergraduate school, which were used as proxies to quantify wealth and education level). We are probably going to add these partially-full rows back into our analysis as we drop features from our model. TODO check if true
```{r}
# This works if you do the following:
# stick the csv file in the same directory as this Rmd
# in RStudio, go to Session -> Set Work Directory -> To Source File location
# this expects only one csv to be in the directory
project.dir <- getwd()
setwd(project.dir)
csv_file <- list.files(file.path(project.dir), pattern="*.csv", full.names=TRUE)
full_data <- fread(file.path(csv_file))
full_data[,c("race_o","field_cd", "race", "goal", "date", "go_out", "career_c", "met", "dec", "length", "numdat_2", "gender", "condtn")] <- lapply(full_data[,c("race_o","field_cd", "race", "goal", "date", "go_out", "career_c", "met", "dec", "length", "numdat_2", "gender", "condtn")], factor)
#income, tuition, and mn_sat have NA values so they aren't numeric
nrow(full_data[complete.cases(full_data[ , c("income","tuition","mn_sat")]),])
nrow(full_data)
#we could look at subset? is there a reason that people left off this information?
#full_data <- nrow(full_data[complete.cases(full_data[ , c("income")]),])
#nrow(full_data[complete.cases(full_data[ , c("mn_sat", "income")]),])
#nrow(full_data[complete.cases(full_data[ , c("tuition", "income")]),])
```
Save/Load current state for easy continuation
(credit: Kev)
```{r}
#setwd()
#saveRDS(full_data, file="full_speed_dating.rds")
#full_data <- readRDS("full_speed_dating.rds")
```
# Predicting rate of people liking you with constant fields
(credit: Beau and Richard)
The variable 'dec_o' encodes whether the other person in the date decided they would like a second date with you (1 for yes, 0 for no). We decided to use this as the response variable instead of 'match' because 'match' means that both people mutually liked each other. In particular, for each participant, we took the mean of 'dec_o' as a proxy for desireable-ness.
We defined a new table 'const_attribs' which represents the constant attributes of the participants. For example, from one date to another your age, field of study, and gender do not change. We wanted to see whether we can predict desirable-ness from your non-subjective attributes.
However, we could not predict anything with this lm (R^2 = 0.14).
```{r}
# rate that people decide they wanted to date you after the speed date
# aggregate certain variables for each individual (avg's of attracitness, intelligence score ... etc. that others gave after meeting)
desirableness <- aggregate(dec_o ~ iid, full_data, mean)
# create subset with the fields that remain constant for an individual (regardless of the meeting)
const_attribs <- full_data[, c("iid", "gender","age", "field_cd", "mn_sat", "tuition", "imprace", "income", "goal", "date", "go_out", "career_c", "match_es","sports", "tvsports", "exercise", "dining", "museums", "art", "hiking", "gaming", "clubbing", "reading", "tv", "theater", "movies", "concerts", "music", "shopping", "yoga")]
# this resulting table has lots of duplicated data
# remove duplicates so there is only one obs per person
const_attribs <- const_attribs[!duplicated(const_attribs),]
# binding back together w desirableness
const_attribs <- merge(desirableness, const_attribs, by="iid")
# merges list of dataframes into a single df. removes iid as a predictor.
full_const_fit <- lm(dec_o ~ .-iid, data = na.omit(const_attribs)) # TODO find reduced model in git commit history
```
After we reduced the insignificant predictors in this model, it had an adjusted R^2 of just 0.14.
# Predicting rate of people liking you with constant fields and partner-rankings
We augmented this linear model using the rankings from your partners: how attractive, sincere, intelligent, fun, and amibiotous your partners percieved you to be durin the 4 minute speed-date. In particular, for each individual we took the mean rating by their partners.
```{r}
#additional explanitory variables. note that after this point, attr_o, sinc_o, etc are
# now aggregated means
attractive <- aggregate(attr_o ~ iid, full_data, mean)
sincere <- aggregate(sinc_o ~ iid, full_data, mean)
intelligent <- aggregate(intel_o ~ iid, full_data, mean)
fun <- aggregate(fun_o ~ iid, full_data, mean)
ambitious <- aggregate( amb_o ~ iid, full_data, mean) # this was assigned incorrectly.. do any of the analyses change in model prediction section?
attractive <- aggregate(attr_o ~ iid, full_data, mean)
likeability <- aggregate(like_o ~ iid,full_data, mean)
augmented_attribs <- Reduce(function(x, y) merge(x, y, all=TRUE), list(attractive, sincere, intelligent, fun, ambitious, const_attribs, desirableness))
```
We removed a few columns that had pesky numerical values. We will come back and probably add these back in after fixing that.
```{r}
# remove fields that consist of mostly NA's
# these are tuition, SAT, and predicted income
augmented_attribs <- augmented_attribs[, -which(names(augmented_attribs) %in% c("tuition", "mn_sat", "income")) ]
# fit a full model
# NOTE: we removed NAs, remember to add these rows back later if we reduce the model
full_augmented_fit <- lm(dec_o ~ .-iid, data = na.omit(augmented_attribs))
summary(full_augmented_fit)
```
The full model has an adj-R^2 of 0.65 on 75 predictor variables. In order to reduce our model, we ran a stepwise AIC.
```{r}
library(MASS)
step <- stepAIC(full_augmented_fit, direction="both")
step$anova # display results
```
Reducing our desirableness model based on our stepAIC:
```{r}
# the stepAIC returned this model as these as our optimal predictors
# dec_o ~ attr_o + fun_o + gender + age + career_c + tvsports + yoga
# dec_o, attr_o, fun_o here is the mean of dec_o, attr_o, fun_o reported by partners
reduced_augmented_fit <- lm(dec_o ~ attr_o + fun_o + gender + age + career_c + tvsports + yoga, data = na.omit(augmented_attribs))
summary(reduced_augmented_fit)
```
Our model stayed with approximately the same adjusted R^2 (0.657), but now we have only 24 predictors. We noticed that career_c did not improve our R^2 by much, so we ran an F-test
```{r}
anova(reduced_augmented_fit, lm(dec_o ~ attr_o + fun_o + gender + age + tvsports + yoga, data = na.omit(augmented_attribs)) )
```
The p-value of the F test indicates the model with career significantly reduces the sum of squares error, so we will keep them it in model.
# Checking for associations between variables
```{r}
pairs(augmented_attribs[,c("dec_o", "attr_o", "fun_o", "gender","age", "career_c", "tvsports","yoga")])
```
We saw some multicollinearity with attr_o and fun_o so we decided to include an interaction term in our model and compared it inside ANOVA.
```{r}
anova(reduced_augmented_fit, lm(dec_o ~ attr_o + fun_o + career_c + gender + age + tvsports + yoga + attr_o*fun_o, data = na.omit(augmented_attribs)) )
# it appears that fun_o and attractiveness_o have a linear relationship with our response AND (maybe) each other
```
Here, we find that the model with the interaction term reduces the sums of squares better than the reduced model without it.
In the following chunk, we rerun the Step AIC and it added the concerts and music variable. We then did another ANOVA model comparison and found that adding those variables reduced the sums of squares better than without it. We saw from the summary that age was not significant, so we did another ANOVA comparison without age, and found that this new reduced model reduced the sums of squares better than our full model.
```{r}
reduced_augmented_fit<-lm(dec_o ~ attr_o + fun_o + career_c + gender + age + tvsports + yoga + attr_o*fun_o, data = na.omit(augmented_attribs))
summary(reduced_augmented_fit)
summary(lm(dec_o ~ attr_o + fun_o + gender + age + career_c + tvsports +
concerts + music + yoga + attr_o:fun_o, data = na.omit(augmented_attribs)))
par(mfrow = c(2,2))
plot(reduced_augmented_fit)
# QQ plot indicates normally distributed errors. The scale-location plot show that there and possibly-maybe some heteroskedasticity in our variance.
#next step is to maybe introduce interaction terms?
full_augmented_fit <- lm(dec_o ~ . + attr_o*fun_o -iid, data = na.omit(augmented_attribs))
step <- stepAIC(full_augmented_fit, direction="both")
step$anova # display results
anova(lm(dec_o ~ attr_o + fun_o + gender + age + career_c + tvsports +
concerts + music + yoga + attr_o:fun_o, data = na.omit(augmented_attribs)), reduced_augmented_fit)
anova(lm(dec_o ~ attr_o + fun_o + gender + age + career_c + tvsports + yoga + attr_o:fun_o, data = na.omit(augmented_attribs)), lm(dec_o ~ attr_o + fun_o + gender + career_c + tvsports + yoga + attr_o:fun_o, data = na.omit(augmented_attribs)))
summary(lm(dec_o ~ attr_o + fun_o + gender + career_c + tvsports + yoga + attr_o:fun_o, data = na.omit(augmented_attribs)))
final_reduced<-lm(dec_o ~ attr_o + fun_o + gender + career_c + tvsports + yoga + attr_o:fun_o, data = na.omit(augmented_attribs))
```
The residual plot appears to indicate some heteroskedasticity. Let's check it out.
```{r}
library(lmtest)
bptest(dec_o ~ attr_o + fun_o + gender + career_c + tvsports + yoga + attr_o:fun_o, data = na.omit(augmented_attribs))
```
The p-value of the BP test is above 0.05 so we are kosher.
# How can we get dates?
(credit: Kev)
Based on the model identified above, we want to help you get a date.
What attributes predict fun or attractiveness?
Are we any good at knowing how fun/attractive we are?
# Predicting likeability
This question is "can we predict how fun someone is perceived based on self-reported attributes" not other's subjective opinions (i.e. can we predict if you will be perceived as fun based on your favorite activites, not on if someone else thinks you are attractive).
# So what makes us fun?
```{r}
fun_augmented_fit <- lm(fun_o ~ .-iid -attr_o -sinc_o -intel_o -dec_o -amb_o, data = na.omit(augmented_attribs))
summary(fun_augmented_fit)
```
```{r}
# reduce model with stepAIC
step <- stepAIC(fun_augmented_fit, direction="both")
step$anova # display results
```
Reducing our model based on our anova results
```{r}
# fun_o ~ gender + go_out + career_c + sports + exercise + dining + gaming + clubbing + tv + concerts + music
fun_reduced_augmented_fit <- lm(fun_o ~ gender + go_out + career_c + sports + exercise + dining +
gaming + clubbing + tv + concerts + music, data = na.omit(augmented_attribs))
summary(fun_reduced_augmented_fit)
```
The R^2 is only 0.1557.
Does the model come out any better if we go forwards?
```{r}
# I don't want the forward approach to grab the perceived data
smaller_augmented_attribs <- augmented_attribs[, -which(names(augmented_attribs) %in% c("attr_o", "sinc_o", "intel_o", "dec_o")) ]
# since we have a bunch of predictor variables I'm going to run a stepwaise AIC
fun_augmented_fit_2 <- lm(fun_o ~ 1, data = na.omit(smaller_augmented_attribs))
summary(fun_augmented_fit_2)
step <- stepAIC(fun_augmented_fit_2, direction="forward", scope = list(upper = fun_augmented_fit, lower = fun_augmented_fit_2))
step$anova # display results
```
Reducing our model based on our anova results
```{r}
# Forward AIC = fun_o ~ go_out + exercise + clubbing + gender + field_cd + tv
fun_reduced_augmented_fit_2 <- lm(fun_o ~ go_out + exercise + clubbing + gender + field_cd + tv, data = na.omit(smaller_augmented_attribs))
summary(fun_reduced_augmented_fit_2)
```
The adjusted R^2 is quite low, at 0.1403 with 27 variables.
Although these appear statistically significant, the Adj. R. Square for this model is very low and there is not much linear relationship from the predicted values
```{r}
plot(augmented_attribs$fun_o ~ augmented_attribs$go_out)
plot(augmented_attribs$fun_o ~ as.factor(augmented_attribs$clubbing))
plot(augmented_attribs$fun_o ~ augmented_attribs$career_c)
```
Where should we go from here with this idea?
- real estate agents are more fun? (but they showed up as statistically significant with a negative coefficient in the decision model above!)
- visually, it is possible that the less you go out (7) the less fun you are
- it could be possible that the more important clubbing is to you, the more fun you are
# How accurate is our self-perception?
We see that personality traits like fun and attractiveness are correlated with if your speed date decides to say "yes" to you. What does this mean for us as far as what we think about ourselves? Do we even know how fun/attractive we are to others?
Alternatively, are others able to pick up on our less obvious traits, like sincerity and ambition, in 4 minutes?
plot self ratings vs. avg of trait
```{r}
# rate that people decide they like you after the speed date
desirableness <- aggregate(dec_o ~ iid, full_data, mean)
#additional explanitory variables
attractive <- aggregate(attr_o ~ iid, full_data, mean)
sincere <- aggregate(sinc_o ~ iid, full_data, mean)
intelligent <- aggregate( intel_o ~ iid, full_data, mean)
fun <- aggregate( fun_o ~ iid, full_data, mean)
ambitious <- aggregate( amb_o ~ iid, full_data, mean)
personality_attribs <- full_data[, c("iid", "attr3_1", "sinc3_1", "fun3_1", "intel3_1", "amb3_1")]
personality_attribs <- personality_attribs[!duplicated(personality_attribs),]
# binding back together
personality_attribs <- merge(desirableness, personality_attribs,by="iid")
# we end up with three columns of dec_o here
personality_attribs <- Reduce(function(x, y) merge(x, y, all=TRUE), list(attractive, sincere, intelligent, fun, ambitious, personality_attribs, desirableness ))
attr_lm <- lm(personality_attribs$attr_o ~ personality_attribs$attr3_1)
summary(attr_lm)
plot(personality_attribs$attr_o ~ personality_attribs$attr3_1, xlab="Self-Reported", ylab = "Partner Rating", main = "Perceived Attractiveness")
abline(attr_lm)
library(ggplot2)
personality_attribs$attr3_1 <- as.factor(personality_attribs$attr3_1)
p <- ggplot(na.omit(personality_attribs), aes(x=attr3_1, y=attr_o, fill=attr3_1)) +
geom_violin(trim=FALSE)
# plot with median and quartile
p + geom_boxplot(width=0.1, fill="white") + labs(title="Perceived Attractiveness", x="Self-Reported", y="Partner Rating") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
# is it possible to add the lm to this plot?
```
```{r}
sinc_lm <- lm(personality_attribs$sinc_o ~ as.numeric(personality_attribs$sinc3_1))
intel_lm <- lm(personality_attribs$intel_o ~ as.numeric(personality_attribs$intel3_1))
fun_lm <- lm(personality_attribs$fun_o ~ as.numeric(personality_attribs$fun3_1))
amb_lm <- lm(personality_attribs$amb_o ~ as.numeric(personality_attribs$amb3_1))
summary(sinc_lm)
summary(intel_lm)
summary(fun_lm)
summary(amb_lm)
plot(personality_attribs$sinc_o ~ as.numeric(personality_attribs$sinc3_1))
abline(sinc_lm)
plot(personality_attribs$fun_o ~ as.numeric(personality_attribs$fun3_1))
abline(fun_lm)
plot(personality_attribs$intel_o ~ as.numeric(personality_attribs$intel3_1))
abline(intel_lm)
plot(personality_attribs$amb_o ~ as.numeric(personality_attribs$amb3_1))
abline(amb_lm)
```
```{r}
personality_attribs$sinc3_1 <- as.factor(personality_attribs$sinc3_1)
personality_attribs$fun3_1 <- as.factor(personality_attribs$fun3_1)
personality_attribs$intel3_1 <- as.factor(personality_attribs$intel3_1)
personality_attribs$amb3_1 <- as.factor(personality_attribs$amb3_1)
ggplot(na.omit(personality_attribs), aes(x=sinc3_1, y=sinc_o, fill=sinc3_1)) +
geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + labs(title="Perceived Sincerity", x="Self-Reported", y="Partner Rating") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
ggplot(na.omit(personality_attribs), aes(x=fun3_1, y=fun_o, fill=fun3_1)) +
geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + labs(title="Perceived Fun", x="Self-Reported", y="Partner Rating") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
ggplot(na.omit(personality_attribs), aes(x=intel3_1, y=intel_o, fill=intel3_1)) +
geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + labs(title="Perceived Intelligence", x="Self-Reported", y="Partner Rating") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
ggplot(na.omit(personality_attribs), aes(x=amb3_1, y=sinc_o, fill=amb3_1)) +
geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + labs(title="Perceived Ambition", x="Self-Reported", y="Partner Rating") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
```
```{r}
#library(tidyverse)
library(dplyr)
sincerity <- full_data[,c("sinc_o", "sinc3_1")]
sincerity <- sincerity %>% count(sinc3_1, sinc_o)
# round very few decimal ratings to integers
sincerity$sinc_o <- as.integer(sincerity$sinc_o)
sincerity$sinc3_1 <- as.factor(sincerity$sinc3_1)
sincerity$sinc_o <- as.factor(sincerity$sinc_o)
fun <- full_data[,c("fun_o", "fun3_1")]
fun <- fun %>% count(fun3_1, fun_o)
fun$fun3_1 <- as.factor(fun$fun3_1)
fun$fun_o <- as.factor(as.integer(fun$fun_o))
ambition <- full_data[,c("amb_o", "amb3_1")]
ambition <- ambition %>% count(amb3_1, amb_o)
ambition$amb3_1 <- as.factor(ambition$amb3_1)
ambition$amb_o <- as.factor(as.integer(ambition$amb_o))
intelligence <- full_data[,c("intel_o", "intel3_1")]
intelligence <- intelligence %>% count(intel3_1, intel_o)
intelligence$intel3_1 <- as.factor(intelligence$intel3_1)
intelligence$intel_o <- as.factor(as.integer(intelligence$intel_o))
hotness <- full_data[,c("attr_o", "attr3_1")]
hotness <- hotness %>% count(attr3_1, attr_o)
hotness$attr3_1 <- as.factor(hotness$attr3_1)
hotness$attr_o <- as.factor(as.integer(hotness$attr_o))
ggplot(data = na.omit(hotness), aes(x=attr3_1, y=n, fill=attr_o)) +
geom_bar(position=position_fill(reverse = TRUE), stat="identity") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))+labs(title="Perceived Attractiveness", x="Self-Reported", y="Partner Rating")
ggplot(data = na.omit(intelligence), aes(x=intel3_1, y=n, fill=intel_o)) +
geom_bar(position=position_fill(reverse = TRUE), stat="identity") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
ggplot(data = na.omit(ambition), aes(x=amb3_1, y=n, fill=amb_o)) +
geom_bar(position=position_fill(reverse = TRUE), stat="identity") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
ggplot(data = na.omit(fun), aes(x=fun3_1, y=n, fill=fun_o)) +
geom_bar(position=position_fill(reverse = TRUE), stat="identity") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
ggplot(data = na.omit(sincerity), aes(x=sinc3_1, y=n, fill=sinc_o)) +
geom_bar(position=position_fill(reverse = TRUE), stat="identity") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
```
```{r}
# show with frequency
ggplot(data = na.omit(hotness), aes(x=attr3_1, y=n, fill=attr_o)) +
geom_bar(position=position_stack(reverse = TRUE), stat="identity") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))+labs(title="Perceived Attractiveness", x="Self-Reported", y="Partner Rating")
ggplot(data = na.omit(intelligence), aes(x=intel3_1, y=n, fill=intel_o)) +
geom_bar(position=position_stack(reverse = TRUE), stat="identity") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
ggplot(data = na.omit(ambition), aes(x=amb3_1, y=n, fill=amb_o)) +
geom_bar(position=position_stack(reverse = TRUE), stat="identity") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
ggplot(data = na.omit(fun), aes(x=fun3_1, y=n, fill=fun_o)) +
geom_bar(position=position_stack(reverse = TRUE), stat="identity") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
ggplot(data = na.omit(sincerity), aes(x=sinc3_1, y=n, fill=sinc_o)) +
geom_bar(position=position_stack(reverse = TRUE), stat="identity") + scale_fill_brewer(palette="Spectral", direction = -1) + guides(fill= guide_legend(reverse = TRUE))
```
#General Takeaway:
Attractiveness - the linear model does not explain much of the variation in peer rating
There may be some trend in increased perceived attractiveness with increased self-rating, however sample sizes of self-ratings <= 4 are small.
Most people rated themselves a 7 or 8 in attractiveness. No one's.
Fun - very similar. Most people rated themselves a 7-9 in fun. A handful of one's.
Perhaps it is hard to determine traits like ambition, sincerity and intelligence in a 4 minute speed date. But it appears that people default to giving a peer review of 7 in all cases.
This is interesting because most people give themselves around a 7 as well. I've seen this trend in other 0-10 rating scales in other datasets. Is there discussion in the stats field about this phenomenon?
Finally:
Turns out.. we kind of suck at predicting how attractive we are to others. What we think about ourselves is not correlated to what others see in us. Maybe there is no point to worrying about how strangers see us, because either they don't or we don't.