You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* An alternative method to test relations
Depends on a commit in datapackage-py
Proof of concept to be discussed
* refacto, passing FK index opti into tableschema
* adding documentation on new method and params
Iterates through the table data and emits rows cast based on table schema. Data casting can be disabled.
212
212
213
213
-`keyed (bool)` - iterate keyed rows
214
214
-`extended (bool)` - iterate extended rows
215
215
-`cast (bool)` - disable data casting if false
216
-
-`relations (dict)` - dictionary of foreign key references in a form of `{resource1: [{field1: value1, field2: value2}, ...], ...}`. If provided, foreign key fields will checked and resolved to their references
216
+
-`relations (dict)` - dictionary of foreign key references in a form of `{resource1: [{field1: value1, field2: value2}, ...], ...}`. If provided, foreign key fields will checked and resolved to one of their references (/!\ one-to-many fk are not completely resolved).
217
+
-`foreign_keys_values (dict)` - three-level dictionary of foreign key references optimized to speed up validation process in a form of `{resource1: { (foreign_key_field1, foreign_key_field2) : { (value1, value2) : {one_keyedrow}, ... }}}`. If not provided but relations is true, it will be created before the validation process by *index_foreign_keys_values* method
217
218
-`(exceptions.TableSchemaException)` - raises any error that occurs during this process
Read the whole table and returns as array of rows. Count of rows could be limited.
226
227
@@ -229,6 +230,7 @@ Read the whole table and returns as array of rows. Count of rows could be limite
229
230
-`cast (bool)` - flag to disable data casting if false
230
231
-`relations (dict)` - dict of foreign key references in a form of `{resource1: [{field1: value1, field2: value2}, ...], ...}`. If provided foreign key fields will checked and resolved to its references
231
232
-`limit (int)` - integer limit of rows to return
233
+
-`foreign_keys_values (dict)` - three-level dictionary of foreign key references optimized to speed up validation process in a form of `{resource1: { (foreign_key_field1, foreign_key_field2) : { (value1, value2) : {one_keyedrow}, ... }}}`
232
234
-`(exceptions.TableSchemaException)` - raises any error that occurs during this process
233
235
-`(list[])` - returns array of rows (see `table.iter`)
234
236
@@ -252,6 +254,18 @@ Save data source to file locally in CSV format with `,` (comma) delimiter
252
254
-`(exceptions.TableSchemaException)` - raises an error if there is saving problem
253
255
-`(True/Storage)` - returns true or storage instance
254
256
257
+
#### `table.index_foreign_keys_values(relations)`
258
+
259
+
Creates a three-level dictionary of foreign key references optimized to speed up validation process in a form of `{resource1: { (foreign_key_field1, foreign_key_field2) : { (value1, value2) : {one_keyedrow}, ... }}}`.
260
+
For each foreign key of the schema it will iterate through the corresponding `relations['resource']` to create an index (i.e. a dict) of existing values for the foreign fields and store on keyed row for each value combination.
261
+
The optimization relies on the indexation of possible values for one foreign key in a hashmap to later speed up resolution.
262
+
This method is public to allow creating the index once to apply it on multiple tables charing the same schema (typically [grouped resources in datapackage](https://github.com/frictionlessdata/datapackage-py#group))
263
+
Note 1: the second key of the output is a tuple of the foreign fields, a proxy identifier of the foreign key
264
+
Note 2: the same relation resource can be indexed multiple times as a schema can contain more than one Foreign Keys pointing to the same resource
265
+
266
+
-`relations (dict)` - dict of foreign key references in a form of `{resource1: [{field1: value1, field2: value2}, ...], ...}`. It must contain all resources pointed in the foreign keys schema definition.
267
+
-`({resource1: { (foreign_key_field1, foreign_key_field2) : { (value1, value2) : {one_keyedrow}, ... }}})` - returns a three-level dictionary of foreign key references optimized to speed up validation process
268
+
255
269
### Schema
256
270
257
271
A model of a schema with helpful methods for working with the schema and supported data. Schema instances can be initialized with a schema source as a url to a JSON file or a JSON object. The schema is initially validated (see [validate](#validate) below). By default validation errors will be stored in `schema.errors` but in a strict mode it will be instantly raised.
0 commit comments