Skip to content

Person type fix#27

Merged
nick-fournier merged 19 commits intodevelopfrom
person_type_fix
Feb 26, 2026
Merged

Person type fix#27
nick-fournier merged 19 commits intodevelopfrom
person_type_fix

Conversation

@nick-fournier
Copy link

@nick-fournier nick-fournier commented Feb 5, 2026

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring
  • Performance improvement

Testing

  • All existing tests pass
  • Added new tests to cover changes
  • Manually tested the changes

Checklist

  • My code follows the style guidelines of this project (runs ruff check . without errors)
  • My code is properly formatted (runs ruff format .)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published

Related Issues

Closes #

Additional Notes

@nick-fournier nick-fournier self-assigned this Feb 5, 2026
@nick-fournier
Copy link
Author

Okay @mtsao96 I think I fixed it, I'm just trying to come up with a whole range of edge cases to test against. So if you got any, let me know and I can try to add them. Or try to add some yourself.

@mtsao96
Copy link

mtsao96 commented Feb 6, 2026

Do we re-categorized PersonType for MandatoryLocation? I re-ran the pipeline with the fixes and it looks like the StudentCategory for MandatoryLocation still shows people age 10 and 16 as not a student.

@nick-fournier
Copy link
Author

Hmm. I'll take another look. Can you get me some offending person IDs?

@mtsao96
Copy link

mtsao96 commented Feb 6, 2026

Yup, here are some person IDs. They're categorized as not employed and not a student even though person age is either 10 or 16:

  • 2300104902
  • 2300171402
  • 2300366003
  • 2300442603
  • 2300462002
  • 2300584005
  • 2300584006
  • 2301259203
  • 2301325603
  • 2301500702

@nick-fournier
Copy link
Author

Wait, are we working with the same codebook?

class CTRAMPPersonType(LabeledEnum):

Because here's what I'm getting

person_id age employment student type
i64 i64 i64 i64 str
------------ ----- ------------ --------- --------------------------
2300104902 10 995 995 Child of non-driving age
2300171402 16 5 995 Child of driving age
2300366003 10 995 995 Child of non-driving age
2300442603 10 995 995 Child of non-driving age
2300462002 10 995 995 Child of non-driving age
2300584005 10 995 995 Child of non-driving age
2300584006 10 995 995 Child of non-driving age
2301259203 16 5 995 Child of driving age
2301325603 10 995 995 Child of non-driving age
2301500702 16 5 995 Child of driving age

@mtsao96
Copy link

mtsao96 commented Feb 6, 2026

Yeah that was the weird thing I noticed. The PersonData had the correct categorization but the MandatoryLocation had different results.

HHIDHomeTAZIncomePersonIDPersonNumPersonTypePersonAgeEmploymentCategoryStudentCategoryWorkLocationSchoolLocationwork_distance_surveyschool_distance_survey
i64i64i64i64i64i64i64strstri64i64f64f64
23001049150323001049022510"Not employed""Not a student"039nullnull
230017149901423001714022516"Not employed""Not a student"01018nullnull
230036605138623003660033510"Not employed""Not a student"0511nullnull
23004426117911123004426033510"Not employed""Not a student"01179null1.014178
2300462025923004620022510"Not employed""Not a student"044nullnull
2300584061413823005840055510"Not employed""Not a student"0760nullnull
2300584061413823005840066510"Not employed""Not a student"0760nullnull
230125927782023012592033516"Not employed""Not a student"0776null2.031946
23013256129111123013256033510"Not employed""Not a student"01292null1.870266
230150071067323015007022516"Not employed""Not a student"01065nullnull

@nick-fournier
Copy link
Author

Ah I see, i was focused on the person table. I think your hunch might be right, the re-categorizing again within mandatory loc format is the problem because it reuses the formatted person table, but we've converted age code to continuous! I'll make the fix and push it up

@nick-fournier
Copy link
Author

Bingo. Just goin gto clean up and push it up

PersonID PersonType PersonTypeStr PersonAge EmploymentCategory WorkLocation SchoolLocation
i64 i32 str i64 str i64 i64
------------ ------------ -------------------------- ----------- -------------------- -------------- ----------------
2300104902 6 Child of non-driving age 10 Not employed 0 39
2300171402 7 Child of driving age 16 Not employed 0 1018
2300366003 6 Child of non-driving age 10 Not employed 0 511
2300442603 6 Child of non-driving age 10 Not employed 0 1179
2300462002 6 Child of non-driving age 10 Not employed 0 44
2300584005 6 Child of non-driving age 10 Not employed 0 760
2300584006 6 Child of non-driving age 10 Not employed 0 760
2301259203 7 Child of driving age 16 Not employed 0 776
2301325603 6 Child of non-driving age 10 Not employed 0 1292
2301500702 7 Child of driving age 16 Not employed 0 1065

@nick-fournier nick-fournier marked this pull request as ready for review February 6, 2026 19:48
@mtsao96
Copy link

mtsao96 commented Feb 10, 2026

Tried to re-run the pipeline and i think there's an error with the student_category field. The type field is correct but the student_category is categorizing most of them as "not a student"

person_id hh_id age type student_category
21 2300104902 23001049 10 Child of non-driving age Not a student
31 2300171402 23001714 16 Child of driving age Not a student
61 2300366003 23003660 10 Child of non-driving age Not a student
76 2300442603 23004426 10 Child of non-driving age Not a student
79 2300462002 23004620 10 Child of non-driving age Not a student
... ... ... ... ... ...
15890 2380045304 23800453 10 Child of non-driving age Not a student
15891 2380045305 23800453 10 Child of non-driving age Not a student
15908 2380099903 23800999 10 Child of non-driving age Not a student
15920 2380148702 23801487 10 Child of non-driving age Not a student
15930 2380229802 23802298 10 Child of non-driving age Not a student

@nick-fournier
Copy link
Author

okay about to push up a fix. I think its because they had NA in student or school_type, but if they'r eunder 16 we can assume student

@mtsao96
Copy link

mtsao96 commented Feb 10, 2026

I believe it worked on my end! I had a follow up question looking at the outputs: There is a small number of people (59) who are categorized as full-time/part-time work, nonworker, retired, and child too young for school but still have a School Location TAZ. Are these people who are not full-time students but are taking classes somewhere?

@mtsao96
Copy link

mtsao96 commented Feb 11, 2026

Oh something else I noticed. In the MandatoryLocation file, the income output was a bit weird. Since we're pulling the income already from the formatted CTRAMP household DataFrame, do we need to include this calculation when mapping the results to CTRAMP column names?

(pl.col("income") / config.income_base_year_dollars).cast(pl.Int64).alias("Income"),

@nick-fournier
Copy link
Author

hmmm. Can you find me a couple of offending person IDs? I'll take a look, this is good stuff though, catching all these weird edge cases.

@mtsao96
Copy link

mtsao96 commented Feb 11, 2026

Yup! Here are some of the IDs with school locations but categorized as not a student. PersonType = 8 (Child Too Young for School) makes up majority of the people (339 out of the 419)

HHID HomeTAZ Income PersonID PersonNum PersonType PersonAge EmploymentCategory StudentCategory WorkLocation SchoolLocation work_distance_survey school_distance_survey
0 23000098 258 138.0 2300009803 3 8 2 Not employed Not a student 0 265 NaN NaN
47 23006955 934 61.0 2300695502 2 4 39 Not employed Not a student 0 955 NaN 3.326465
56 23007137 488 86.0 2300713702 2 8 2 Not employed Not a student 0 452 NaN 3.179811
57 23007137 488 86.0 2300713704 4 8 2 Not employed Not a student 0 452 NaN 3.392626
96 23012895 778 NaN 2301289502 2 8 2 Not employed Not a student 0 778 NaN 0.262782
... ... ... ... ... ... ... ... ... ... ... ... ... ...
8389 23757951 794 NaN 2375795104 4 8 2 Not employed Not a student 0 777 NaN 7.124364
8393 23758158 1082 61.0 2375815803 3 7 21 Not employed Not a student 0 1072 NaN 4.587002
8426 23800999 113 NaN 2380099901 1 4 39 Not employed Not a student 0 1021 NaN NaN
8446 23802914 29 NaN 2380291402 2 1 59 Full-time employed Not a student 0 24 NaN 0.627584
8447 23802914 29 NaN 2380291403 3 1 59 Full-time employed Not a student 0 24 NaN 0.906157

419 rows × 13 columns

@nick-fournier
Copy link
Author

nick-fournier commented Feb 13, 2026

@mtsao96 so if they are too young for school, should they be not a student? I think a lot are maybe daycare things.

We can either:

  • leave them as not a student
  • lump them in grade or higher
  • remove their school taz
  • modify ctramp to have pre-school (lol)

@mtsao96
Copy link

mtsao96 commented Feb 13, 2026

I think we can just leave it as is, i.e. so leave them as not a student and keep the school TAZ. We're not summarizing them in the UsualWorkSchoolLocation model so it doesn't impact anything there. (No CTRAMP modification please 😭 )

@nick-fournier
Copy link
Author

Okay sounds good. I think in school type it has daycare/preschool so they can at least be ID.

But I'll fix cases where there are >5 y/os getting not a student. I think that leaves about 336 or so people < 5

@mtsao96
Copy link

mtsao96 commented Feb 13, 2026

Found another edge case now 😅 where there are ~41 people with the status as full-time employee and college or higher student, but they only have a School Location and no Work Location

HHID HomeTAZ Income PersonID PersonNum PersonType PersonAge EmploymentCategory StudentCategory WorkLocation SchoolLocation work_distance_survey school_distance_survey
2 23000432 1009 61.0 2300043202 2 1 29 Full-time employed College or higher 0 1019 NaN 0.817120
7 23001501 1069 138.0 2300150101 1 1 29 Full-time employed College or higher 0 15 NaN 21.116913
532 23105808 607 NaN 2310580801 1 1 49 Full-time employed College or higher 0 300 NaN 29.649032
568 23107792 1026 138.0 2310779201 1 1 49 Full-time employed College or higher 0 946 NaN 6.025422
706 23119223 100 43.0 2311922302 2 1 29 Full-time employed College or higher 0 354 NaN NaN
908 23135021 862 61.0 2313502101 1 5 69 Full-time employed College or higher 0 840 NaN NaN
1108 23147838 1361 61.0 2314783801 1 1 29 Full-time employed College or higher 0 133 NaN 57.451974
1111 23148134 145 74.0 2314813401 1 5 69 Full-time employed College or higher 0 133 NaN NaN
1248 23156938 137 138.0 2315693801 1 1 39 Full-time employed College or higher 0 100 NaN 3.090014
1302 23161983 378 138.0 2316198301 1 1 39 Full-time employed College or higher 0 371 NaN 2.140610
1428 23172799 969 3.0 2317279901 1 1 29 Full-time employed College or higher 0 991 NaN NaN
1435 23173392 86 30.0 2317339201 1 1 59 Full-time employed College or higher 0 86 NaN 0.402032
1565 23181891 135 30.0 2318189101 1 1 49 Full-time employed College or higher 0 1427 NaN NaN
1730 23196223 549 138.0 2319622301 1 1 59 Full-time employed College or higher 0 533 NaN NaN
2157 23228052 355 138.0 2322805202 2 1 29 Full-time employed College or higher 0 354 NaN 1.135920
2430 23251383 1225 30.0 2325138301 1 1 29 Full-time employed College or higher 0 1221 NaN NaN
2434 23251639 1231 86.0 2325163902 2 1 39 Full-time employed College or higher 0 13 NaN NaN
2586 23262420 586 9.0 2326242001 1 1 39 Full-time employed College or higher 0 653 NaN NaN
3074 23296003 174 9.0 2329600301 1 1 59 Full-time employed College or higher 0 91 NaN NaN
3526 23327263 1362 14.0 2332726301 1 1 29 Full-time employed College or higher 0 1363 NaN NaN
3562 23330954 279 138.0 2333095402 2 1 49 Full-time employed College or higher 0 291 NaN NaN
3760 23345933 132 30.0 2334593301 1 1 59 Full-time employed College or higher 0 133 NaN 0.560996
4263 23382251 150 20.0 2338225102 2 1 21 Full-time employed College or higher 0 150 NaN NaN
4464 23395828 297 138.0 2339582801 1 5 87 Full-time employed College or higher 0 373 NaN 10.960887
4697 23410961 58 NaN 2341096101 1 1 39 Full-time employed College or higher 0 178 NaN 6.138203
4956 23427876 1387 20.0 2342787602 2 5 79 Full-time employed College or higher 0 1363 NaN NaN
5130 23440145 132 NaN 2344014502 2 1 39 Full-time employed College or higher 0 133 NaN 0.827502
5266 23450810 47 61.0 2345081001 1 1 21 Full-time employed College or higher 0 133 NaN 9.043189
6023 23517993 1263 30.0 2351799303 3 1 21 Full-time employed College or higher 0 1249 NaN 4.911290
6077 23525768 331 138.0 2352576802 2 1 39 Full-time employed College or higher 0 1019 NaN NaN
6161 23535548 976 3.0 2353554801 1 1 21 Full-time employed College or higher 0 133 NaN 14.575813
6308 23554022 334 30.0 2355402201 1 1 29 Full-time employed College or higher 0 291 NaN NaN
6546 23578556 1167 43.0 2357855602 2 1 59 Full-time employed College or higher 0 1161 NaN NaN
6981 23624602 1074 61.0 2362460201 1 1 39 Full-time employed College or higher 0 885 NaN 19.883012
7019 23630850 269 138.0 2363085002 2 1 21 Full-time employed College or higher 0 291 NaN 6.791110
7414 23665259 748 138.0 2366525902 2 1 39 Full-time employed College or higher 0 748 NaN 0.020535
7449 23668834 1236 30.0 2366883401 1 1 29 Full-time employed College or higher 0 1294 NaN NaN
7565 23680087 1175 43.0 2368008701 1 1 29 Full-time employed College or higher 0 946 NaN NaN
7614 23685735 729 61.0 2368573501 1 1 39 Full-time employed College or higher 0 1127 NaN NaN
8446 23802914 29 NaN 2380291402 2 1 59 Full-time employed College or higher 0 24 NaN 0.627584
8447 23802914 29 NaN 2380291403 3 1 59 Full-time employed College or higher 0 24 NaN 0.906157

@nick-fournier
Copy link
Author

Well wait, people can have no work location and be workers, they just dont have a fixed locaiton right?

@mtsao96
Copy link

mtsao96 commented Feb 13, 2026

Yeah people can be a full-time employee and not have a work location. But I think I only called this out because all the people that are full workers with no work locations were all classified as college students too.

@nick-fournier
Copy link
Author

AH i see. PersonType is full time. Okay. I think we can use the canonical student vs employment as tie breaker.
If they are full-time student & full-time worker -> student
If they are part-time student & full-time worker -> full time worker

@nick-fournier nick-fournier merged commit beb9283 into develop Feb 26, 2026
7 checks passed
@nick-fournier nick-fournier deleted the person_type_fix branch February 27, 2026 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants