You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/docs/recipes/relationship-between-fields.md
+43-39Lines changed: 43 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,13 +5,13 @@ title: Relationship between Fields
5
5
<table>
6
6
<tr>
7
7
<th>Authors</th>
8
-
<td>Philippe Thomy, Peter Desmet</td>
8
+
<td>Philippe Thomy</td>
9
9
</tr>
10
10
</table>
11
11
12
-
The structure of tabular datasets is simple: a set of Fields grouped in a table.
12
+
The structure of tabular datasets is simple: a set of fields grouped in a table.
13
13
14
-
However, the data present is often complex and reflects an interdependence between Fields (see explanations in the Internet-Draft [NTV tabular format (NTV-TAB)](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2)).
14
+
However, the data present is often complex and reflects an interdependence between fields (see explanations in the Internet-Draft [NTV tabular format (NTV-TAB)](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2)).
15
15
16
16
Let's take the example of the following dataset:
17
17
@@ -22,15 +22,15 @@ Let's take the example of the following dataset:
22
22
| Estonia | European Union | ES | 449 |
23
23
| Nigeria | Africa | NI | 1460 |
24
24
25
-
The data schema for this dataset indicates in the Field Descriptor "description":
25
+
The data schema for this dataset has the following `description`:
26
26
27
-
- for the "code" Field : "country code alpha-2"
28
-
- for the "population" Field: "region population in 2022 (millions)"
27
+
- for the `code` field : "country code alpha-2"
28
+
- for the `population` field: "region population in 2022 (millions)"
29
29
30
30
If we now look at the data we see that this dataset is not consistent because it contains two structural errors:
31
31
32
-
- The value of the "code" Field must be unique for each country, we cannot therefore have "ES" for "Spain" and "Estonia",
33
-
- The value of the "population" Field of "European Union" cannot have two different values (449 and 48)
32
+
- The value of the `code` Ffeld must be unique for each country, we cannot therefore have "ES" for "Spain" and "Estonia",
33
+
- The value of the `population` field of "European Union" cannot have two different values (449 and 48)
34
34
35
35
These structural errors make the data unusable and yet they are not detected in the validation of the dataset (in the current version of Table Schema, there are no Descriptors to express this dependency between two fields).
36
36
@@ -70,92 +70,96 @@ Two aspects need to be addressed:
70
70
71
71
A relationship is defined by the following information:
72
72
73
-
- the two Fields involved (the order of the Fields is important with the "derived" link),
73
+
- the two fields involved (the order of the fields is important with the `derived` link),
74
74
- the textual representation of the relationship,
75
75
- the nature of the relationship
76
76
77
77
Three proposals for extending Table Schema are being considered:
78
78
79
-
- New Field Descriptor
80
-
- New Constraint Property
81
-
- New Table Descriptor
79
+
- New field descriptor
80
+
- New constraint property
81
+
- New table descriptor
82
82
83
-
After discussions only the third is retained (a relationship between fields associated to a Field) and presented below:
83
+
After discussions only the third is retained (a relationship between fields associated to a field) and presented below:
84
84
85
-
-**New Table Descriptor**:
85
+
-**New table descriptor**:
86
86
87
-
A `relationships`Table Descriptor is added.
88
-
The properties associated with this Descriptor could be:
87
+
A `relationships`table descriptor is added.
88
+
The properties associated with this descriptor could be:
89
89
90
-
-`fields`: array with the names of the two Fields involved
90
+
-`fields`: array with the names of the two fields involved
91
91
-`description`: description string (optional)
92
92
-`link`: nature of the relationship
93
93
94
94
Pros:
95
95
96
-
- No mixing with Fields descriptors
96
+
- No mixing with fields descriptors
97
97
98
98
Cons:
99
99
100
-
- Need to add a new Table Descriptor
101
-
- The order of the Fields in the array is important with the "derived" link
100
+
- Need to add a new table descriptor
101
+
- The order of the fields in the array is important with the `derived` link
102
102
103
103
Example:
104
104
105
105
```json
106
-
{ "fields": [ ],
106
+
{
107
+
"fields": [ ],
107
108
"relationships": [
108
-
{ "fields" : [ "country", "code"],
109
+
{
110
+
"fields" : ["country", "code"],
109
111
"description" : "is the country code alpha-2 of",
110
112
"link" : "coupled"
111
113
}
112
-
{ "fields" : [ "region", "population"],
114
+
{
115
+
"fields" : ["region", "population"],
113
116
"description" : "is the population of",
114
-
"link" : "derived"}
117
+
"link" : "derived"
118
+
}
115
119
]
116
120
}
117
121
```
118
122
119
123
## Specification
120
124
121
-
Assuming solution 3 (Table Descriptor), the specification could be as follows:
125
+
Assuming solution 3 (table descriptor), the specification could be as follows:
122
126
123
-
The `relationships`Descriptor MAY be used to define the dependency between fields.
127
+
The `relationships`descriptor MAY be used to define the dependency between fields.
124
128
125
-
The `relationships`Descriptor, if present, MUST be an array where each entry in the array is an object and MUST contain two required properties and one optional:
129
+
The `relationships`descriptor, if present, MUST be an array where each entry in the array is an object and MUST contain two required properties and one optional:
126
130
127
131
-`fields`: Array with the property `name` of the two fields linked (required)
128
132
-`link` : String with the nature of the relationship between them (required)
129
-
-`description` : String with the description of the relationship between the two Fields (optional)
133
+
-`description` : String with the description of the relationship between the two fields (optional)
130
134
131
135
The `link` property value MUST be one of the three following :
132
136
133
-
-`derived`:
137
+
-`derived`:
134
138
135
139
- The values of the child (second array element) field are dependant on the values of the parent (first array element) field (i.e. a value in the parent field is associated with a single value in the child field).
136
-
- e.g. The "name" field ["john", "paul", "leah", "paul"] and the "Nickname" field ["jock", "paulo", "lili", "paulo"] are derived,
137
-
- i.e. if a new entry "leah" is added, the corresponding "nickname" value must be "lili".
140
+
- e.g. The `name` field ["john", "paul", "leah", "paul"] and the `nickname` field ["jock", "paulo", "lili", "paulo"] are derived,
141
+
- i.e. if a new entry "leah" is added, the corresponding `nickname` value must be "lili".
138
142
139
-
-`coupled`:
143
+
-`coupled`:
140
144
141
145
- The values of one field are associated to the values of the other field.
142
-
- e.g. The "Country" field ["france", "spain", "estonia", "spain"] and the "code alpha-2" field ["FR", "ES", "EE", "ES"] are coupled,
143
-
- i.e. if a new entry "estonia" is added, the corresponding "code alpha-2" value must be "EE" just as if a new entry "EE" is added, the corresponding "Country" value must be "estonia".
146
+
- e.g. The `Country` field ["france", "spain", "estonia", "spain"] and the `code alpha-2` field ["FR", "ES", "EE", "ES"] are coupled,
147
+
- i.e. if a new entry "estonia" is added, the corresponding `code alpha-2` value must be "EE" just as if a new entry "EE" is added, the corresponding `Country` value must be "estonia".
144
148
145
-
-`crossed`:
149
+
-`crossed`:
146
150
147
151
- This relationship means that all the different values of one field are associated with all the different values of the other field.
148
-
- e.g. the "Year" Field [2020, 2020, 2021, 2021] and the "Population" Field[ "estonia", "spain", "estonia", "spain"] are crossed
152
+
- e.g. the `Year` field [2020, 2020, 2021, 2021] and the `Population` field[ "estonia", "spain", "estonia", "spain"] are crossed
149
153
- i.e the year 2020 is associated to population of "spain" and "estonia", just as the population of "estonia" is associated with years 2020 and 2021
150
154
151
155
## Implementations
152
156
153
-
The implementation of a new Descriptor is not discussed here (no particular point to address).
157
+
The implementation of a new descriptor is not discussed here (no particular point to address).
154
158
155
159
The control implementation is based on the following principles:
156
160
157
-
- calculation of the number of different values for the two Fields,
158
-
- calculation of the number of different values for the virtual Field composed of tuples of each of the values of the two Fields
161
+
- calculation of the number of different values for the two fields,
162
+
- calculation of the number of different values for the virtual field composed of tuples of each of the values of the two fields
159
163
- comparison of these three values to deduce the type of relationship
160
164
- comparison of the calculated relationship type with that defined in the data schema
0 commit comments