Skip to content

Commit de8cd2e

Browse files
Compare with Wikibase tabular data model (#1059)
Co-authored-by: Peter Desmet <[email protected]>
1 parent 2400e3d commit de8cd2e

File tree

1 file changed

+61
-0
lines changed

1 file changed

+61
-0
lines changed
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
title: Comparison with MediaWiki Tabular Data
3+
sidebar:
4+
order: 3
5+
---
6+
7+
<table>
8+
<tr>
9+
<th>Authors</th>
10+
<td>Jakob Voß</td>
11+
</tr>
12+
</table>
13+
14+
[MediaWiki](https://www.mediawiki.org/) is the software used to run Wikipedia and related projects of the Wikimedia Foundation, including the media file repository [Wikimedia Commons](https://commons.wikimedia.org/). Commons hosts mostly images but also some records with tabular data. The [MediaWiki Tabular Data Model](https://www.mediawiki.org/wiki/Help:Tabular_data) was inspired by Data Package version 1 but it slightly differs from current Data Package specification, as described below.
15+
16+
## Property Comparison
17+
18+
A [MediaWiki tabular data page](https://www.mediawiki.org/wiki/Help:Tabular_data) describes and contains an individual table of data similar to a [Data Resource](/standard/data-resource/) with inline tabular data. Both are serialized as JSON objects, but the former comes as a page with unique name in a MediaWiki instance (such as Wikimedia Commons).
19+
20+
### Top-level Properties
21+
22+
MediaWiki Tabular Data has three required and two optional top-level properties. Most of these properties map to corresponding properties of a Data Resource:
23+
24+
| MediaWiki Tabular Data | Data Package Table Schema |
25+
| ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------- |
26+
| - (implied by page name) | [name](/standard/data-resource/#name) (required) is a string |
27+
| [description](https://www.mediawiki.org/wiki/Help:Tabular_data#Top-level_fields) (optional) is a localized string | [description](/standard/data-resource/#description) (optional) is a CommonMark string |
28+
| [data](https://www.mediawiki.org/wiki/Help:Tabular_data#Top-level_fields) (required) | [data](/standard/data-resource/#name) (optional) |
29+
| [license](https://www.mediawiki.org/wiki/Help:Tabular_data#Top-level_fields) (required) is the string `CC0-1.0` or another known identifier | [licenses](/standard/data-resource/#licenses) (optional) is an array |
30+
| [schema](https://www.mediawiki.org/wiki/Help:Tabular_data#Top-level_fields) (required) as [described below](#schema-properties) | [schema](/standard/data-resource/#schema) (optional) can have multiple forms |
31+
| [sources](https://www.mediawiki.org/wiki/Help:Tabular_data#Top-level_fields) (optional) is a string with Wiki markup | [sources](/standard/data-resource/#sources) (optional) is an array of objects |
32+
33+
The differences are:
34+
35+
- property `name` does not exist but can be implied from page name
36+
- property `description` and `sources` have another format
37+
- property `data` is always an array of arrays and [data types](#data-types) of individual values can differ
38+
- property `schema` is required but it differs in definion of [schema properties](#schema-properties)
39+
- there is no property `licenses` but `license` fixed to plain string value `CC0-1.0` (other license indicators may be possible)
40+
41+
### Data Types
42+
43+
Tabular Data supports four data types that overlap with [Table Schema data types](/standard/table-schema/#field-types):
44+
45+
- `number` subset of Table Schema [number](/standard/table-schema/#number) (no `NaN`, `INF`, or `-INF`)
46+
- `boolean` same as Table Schema [boolean](/standard/table-schema/#boolean)
47+
- `string` subset of Table Schema [string](/standard/table-schema/#string) (limited to 400 characters at most and must not include `\n` or `\t`)
48+
- `localized ` refers to an object that maps language codes to strings with same limitations as `string` type.
49+
This type is not supported in Table Schema.
50+
51+
Individual values in a MediaWiki Tabular Data table can always be `null`, while in Table Schema you need to explicitly list values that should be considered missing in [schema.missingValues](/standard/table-schema/#missingValues).
52+
53+
### Schema Properties
54+
55+
The `schema` property of MediaWiki tabular contains an object with property `fields` just like [Table Schema](/standard/table-schema/) but no other properties are allowed. Elements of this array are like Table Schema [field descriptors](/standard/table-schema/#field) limited to three properties and different value spaces:
56+
57+
| MediaWiki Tabular Data | Data Package Table Schema |
58+
| ---------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
59+
| [name](https://www.mediawiki.org/wiki/Help:Tabular_data#Top-level_fields) (required) must be a string matching `^[a-zA-Z_][a-zA-Z_0-9]*` | [name](/standard/table-schema/#name) (required) can be any string |
60+
| [type](https://www.mediawiki.org/wiki/Help:Tabular_data#Top-level_fields) (required) is one of the [Data Types above](#data-types) | [type](/standard/table-schema/#type) (optional) with [different data types](#data-types) |
61+
| [title](https://www.mediawiki.org/wiki/Help:Tabular_data#Top-level_fields) (optional) is a localized string | [title](/standard/table-schema/#title) (optional) is a plain string |

0 commit comments

Comments
 (0)