|
305 | 305 | "\n", |
306 | 306 | "Then, it's time to prepare our data. We'll create a `DataTree`\n", |
307 | 307 | "that defines the relationships among all the datasets we're working\n", |
308 | | - "with. This is a tree in the mathematical sense, with nodes referencing\n", |
309 | | - "the datasets and edges representing the relationships." |
| 308 | + "with. This is a tree roughly in the mathematical sense, with nodes referencing\n", |
| 309 | + "the dataset dimensions and edges representing the relationships." |
310 | 310 | ] |
311 | 311 | }, |
312 | 312 | { |
|
355 | 355 | "source": [ |
356 | 356 | "The first named dataset we include, `tour`, is by default the root node of this data tree.\n", |
357 | 357 | "We then can define an arbitrary number of other named data nodes. Here, we add `person`, `hh`,\n", |
358 | | - "`odt_skims` and `odt_skims`. Note that these last two are actually two different names for the\n", |
| 358 | + "`odt_skims` and `dot_skims`. Note that these last two are actually two different names for the\n", |
359 | 359 | "same underlying dataset, and for each name we will next define a unique set of relationships.\n", |
| 360 | + "For each of these other data nodes, we will need to define some way to link each dimension of\n", |
| 361 | + "them back to the root node, so that for any position in the root node's arrays, we can find\n", |
| 362 | + "one corresponding value in each of the other datasets variables.\n", |
360 | 363 | "\n", |
361 | 364 | "All data nodes in this tree are stored as `Dataset` objects. We can give a pandas DataFrame\n", |
362 | | - "in this contructor instead, but it will be automatically converted into a one-dimension `Dataset`.\n", |
| 365 | + "in this constructor instead, but it will be automatically converted into a one-dimension `Dataset`.\n", |
363 | 366 | "The conversion is no-copy if possible (and it is usually possible) so no additional memory is\n", |
364 | 367 | "consumed in the conversion.\n", |
365 | 368 | "\n", |
366 | 369 | "The `relationships` defines links of the data tree. Each relationship maps a particular variable\n", |
367 | 370 | "in a named upstream dataset to a particular dimension of a named downstream dataset. For example,\n", |
368 | 371 | "`\"person.household_id @ hh.HHID\"` tells the tree that the `household_id` variable in the `person` \n", |
369 | | - "dataset contains labels (`@`) that map to the `HHID` dimension of the `hh` dataset.\n", |
| 372 | + "dataset contains labels (`@`) that map to the `HHID` dimension of the `hh` dataset. Similarly,\n", |
| 373 | + "`\"tour.PERID @ person.PERID\"` tells the tree that the `PERID` variable in the `tour` dataset\n", |
| 374 | + "contains labels that map to the `PERID` dimension of the `person` dataset. From this, we can\n", |
| 375 | + "see that any position in the \"tour\" dataset can be mapped to a position in the \"person\" dataset,\n", |
| 376 | + "in a many-to-one manner, and from there to a position in the \"hh\" dataset, also in a many-to-one\n", |
| 377 | + "manner. Unlike tours, persons, and households, the `skims` datasets are multi-dimensional, so we need to\n", |
| 378 | + "map multiple dimensions. For the `odt_skims` dataset, we map the origin TAZ dimension (`otaz`)\n", |
| 379 | + "to the household TAZ (`hh.TAZ`), and the destination TAZ dimension (`dtaz`) to the tour\n", |
| 380 | + "destination TAZ (`tour.dest_taz_idx`), and the time period dimension (`time_period`) to the\n", |
| 381 | + "tour outbound time period (`tour.out_time_period`). This way, even though the skims dataset\n", |
| 382 | + "is multi-dimensional, we can still find one unique position in the skims dataset for each\n", |
| 383 | + "position in the tours dataset. The same is done for the `dot_skims` dataset, which actually\n", |
| 384 | + "contains the same data as `odt_skims`, but the mapping of the dimensions is different, so a\n", |
| 385 | + "different unique position in the skims dataset is found for each position in the tours dataset.\n", |
370 | 386 | "\n", |
371 | 387 | "In addition to mapping by label, we can also map by position, by using the `->` operator in the\n", |
372 | 388 | "relationship string instead of `@`. In the example above, we map the tour destination TAZ's in\n", |
373 | 389 | "this manner, as the `dest_taz_idx` variable in the `tours` dataset contains positional references\n", |
374 | 390 | "instead of labels.\n", |
375 | 391 | "\n", |
376 | | - "A special case for the relationship mapping is available when the source varibable\n", |
| 392 | + "A special case for the relationship mapping is available when the source variable\n", |
377 | 393 | "in the upstream dataset is explicitly categorical. In this case, sharrow checks that\n", |
378 | 394 | "the categories exactly match the labels in the referenced downstream dataset dimension,\n", |
379 | 395 | "and that there are no missing categorical values. If they do match and there are no\n", |
|
0 commit comments