Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
d6241c9
remove references to dataclass objects
kmoscoe Jun 12, 2025
101c5fd
Fix a copy-paste error.
kmoscoe Jun 12, 2025
504d852
remove extra file
kmoscoe Jun 12, 2025
1697b22
Merge branch 'datacommonsorg:master' into master
kmoscoe Jun 12, 2025
426a81c
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Jun 24, 2025
a40c7c8
Merge branch 'master' of https://github.com/kmoscoe/docsite
kmoscoe Jun 24, 2025
4c69c15
Merge branch 'datacommonsorg:master' into master
kmoscoe Sep 17, 2025
dd5b50f
Merge branch 'datacommonsorg:master' into master
kmoscoe Sep 23, 2025
23cb4c4
Merge branch 'datacommonsorg:master' into master
kmoscoe Sep 23, 2025
1157311
Merge branch 'datacommonsorg:master' into master
kmoscoe Sep 24, 2025
23d3429
Merge branch 'datacommonsorg:master' into master
kmoscoe Sep 25, 2025
2a3409f
Merge branch 'datacommonsorg:master' into master
kmoscoe Sep 30, 2025
516ed75
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Oct 7, 2025
564457c
Merge branch 'datacommonsorg:master' into master
kmoscoe Oct 7, 2025
7052453
Merge branch 'datacommonsorg:master' into master
kmoscoe Oct 7, 2025
5da50d1
Merge branch 'datacommonsorg:master' into master
kmoscoe Oct 8, 2025
a011388
Merge branch 'master' of https://github.com/kmoscoe/docsite
kmoscoe Oct 8, 2025
12b9749
Merge branch 'datacommonsorg:master' into master
kmoscoe Oct 14, 2025
f4861e4
Merge branch 'datacommonsorg:master' into master
kmoscoe Oct 15, 2025
33234d2
Fix a copy-paste error.
kmoscoe Jun 12, 2025
169781f
Merge branch 'datacommonsorg:master' into master
kmoscoe Oct 24, 2025
f3a9005
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Oct 27, 2025
205cb04
Merge branch 'datacommonsorg:master' into master
kmoscoe Nov 3, 2025
3daf24f
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Nov 5, 2025
7599eff
Merge branch 'datacommonsorg:master' into master
kmoscoe Nov 5, 2025
4dd0251
Merge branch 'master' of https://github.com/kmoscoe/docsite
kmoscoe Nov 5, 2025
54ca4cf
Remove unused file
kmoscoe Nov 5, 2025
5ab6c5c
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Nov 11, 2025
1c2a36c
Merge branch 'datacommonsorg:master' into master
kmoscoe Nov 24, 2025
5d2800b
Merge branch 'datacommonsorg:master' into master
kmoscoe Nov 25, 2025
b2cbfd4
Merge branch 'datacommonsorg:master' into master
kmoscoe Dec 3, 2025
494375d
Merge branch 'datacommonsorg:master' into master
kmoscoe Dec 9, 2025
f4da5c3
Merge branch 'datacommonsorg:master' into master
kmoscoe Dec 9, 2025
73c0d41
Merge branch 'datacommonsorg:master' into master
kmoscoe Dec 9, 2025
735db87
Merge branch 'datacommonsorg:master' into master
kmoscoe Dec 16, 2025
d165d9e
Merge branch 'datacommonsorg:master' into master
kmoscoe Dec 17, 2025
4559295
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Dec 17, 2025
85d15a4
Merge branch 'master' of https://github.com/kmoscoe/docsite
kmoscoe Dec 17, 2025
d80ed79
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Jan 13, 2026
ef9fe1f
Update Quickstart to use explicit schema
kmoscoe Jan 14, 2026
5a29614
Update custom_dc/quickstart.md
kmoscoe Jan 14, 2026
e291b91
Changes from Keyur
kmoscoe Jan 14, 2026
10f1329
Merge branch 'explicit' of https://github.com/kmoscoe/docsite into ex…
kmoscoe Jan 14, 2026
de7d36b
Remove references to implicit schema in the custom data page
kmoscoe Jan 14, 2026
9859714
Remove implicit schema from config reference
kmoscoe Jan 14, 2026
bd73db0
More changes
kmoscoe Jan 14, 2026
90b24f7
Merge branch 'datacommonsorg:master' into explicit
kmoscoe Jan 14, 2026
cbf7ccd
Update custom_dc/config.md
kmoscoe Jan 14, 2026
63c8332
Update custom_dc/custom_data.md
kmoscoe Jan 14, 2026
5a5563f
add explanation of format option
kmoscoe Jan 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 14 additions & 159 deletions custom_dc/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,15 @@ Here is the general spec for the `config.json` file.

"inputFiles": {
"<var>CSV_FILE_EXPRESSION1</var>": {

"format": "variablePerColumn" | "variablePerRow",
"format": "variablePerRow",
"provenance": "<var>NAME</var>",

# For implicit schema only
"importType": "variables" | "entities",
"ignoreColumns": ["<var>COLUMN_HEADING1</var>", "<var>COLUMN_HEADING2</var>", ...],
# Variables only
"entityType": "<var>ENTITY_TYPE_DCID</var>",

# For implicit schema only, custom entities only
# For entities only
"rowEntityType": "<var>ENTITY_TYPE_DCID</var>",
"idColumn": "<var>COLUMN_HEADING</var>",
"entityColumns": ["<var>COLUMN_HEADING_DCID1</var>", "<var>COLUMN_HEADING_DCID2</var>", ...],

# For explicit schema only
# For variables only
"entityType": "<var>ENTITY_TYPE_DCID</var>",
"columnMappings": {
"variable": "<var>NAME</var>",
"entity": "<var>NAME</var>",
Expand All @@ -42,49 +35,12 @@ Here is the general spec for the `config.json` file.
"measurementMethod": "<var>NAME</var>",
"observationPeriod": "<var>NAME</var>"
}

# For implicit schema only
"observationProperties" {
"unit": "<var>MEASUREMENT_UNIT</var>",
"observationPeriod": "<var>OBSERVATION_PERIOD</var>",
"scalingFactor": "<var>DENOMINATOR_VALUE</var>",
"measurementMethod": "<var>METHOD</var>"
}

"<var>CSV_FILE_EXPRESSION2</var>": {
...
}
},
...

# For implicit schema only, custom entities only
"entities": {
"<var>ENTITY_TYPE_DCID</var>: {
"name": "<var>ENTITY_TYPE_NAME</var>",
"description: "<var>ENTITY_TYPE_DESCRIPTION</var>"
}
...
},
},

# For implicit schema only
"variables": {
"<var>VARIABLE1</var>": {
"group": "<var>GROUP_NAME1</var>"},
"name": "<var>DISPLAY_NAME</var>",
"description": "<var>DESCRIPTION</var>",
"searchDescriptions": ["<var>SENTENCE1</var>", "<var>SENTENCE2</var>", ...],
"properties": {
"<var>PROPERTY_NAME1</var>":"<var>VALUE</var>",
"<var>PROPERTY_NAME2</var>":"<var>VALUE</var>",
},
},
"<var>VARIABLE2</var>": {"group": "<var>GROUP_NAME1</var>", ...},
"<var>VARIABLE3</var>": {"group": "<var>GROUP_NAME2</var>", ...},
...
},
},

# For explicit schema only
"groupStatVarsByProperty": false | true,

"sources": {
Expand Down Expand Up @@ -141,39 +97,27 @@ The first set of parameters only applies to `foo.csv`. The second set of paramet

format

: Only needed to specify `variablePerRow` for explicit schemas. The assumed default is `variablePerColumn` (implicit schema).
: Required: Specify `variablePerRow`. The other option, `variablePerColumn`, is now deprecated.

provenance

: Required: The provenance (named source) of this input file. Provenances map from a source to a dataset. The name here must correspond to the name defined as a `provenance` in the `sources` section. For example, `WorldDevelopmentIndicators` provenance (or dataset) is from the `WorldBank` source.

You must specify the provenance details under `sources.provenances`; this field associates one of the provenances defined there to this file.

ignoreColumns (implicit schema only)

: Optional: A list of headings representing columns that should be ignored by the importer, if any.

importType (implicit schema only)
importType

: Only needed to specify `entities` for custom entity imports. The assumed default is `variables`.
: Specify `entities` for custom entity imports. Otherwise defaults to `variables`.

entityType (implicit schema only, variables only)
entityType (variables only)

: Required for CSV files containing observations: All entities in a given file must be of a specific type. The importer tries to resolve entities to DCIDs of that type. In most cases, the `entityType` will be a supported place type; see [Place types](../place_types.html) for a list. For CSV files containing custom entities, use the `rowEntityType` option instead.

rowEntityType (implicit schema only, entities only)

: Required for CSV files containing custom entities: The DCID of the entity type (new or existing) of the custom entities you are importing. It must match the DCID specified in the `entities` section(s). For example, if you are importing a set of hospital entities, the entity type could be the existing entity type [`Hospital`](https://datacommons.org/browser/Hospital){: target="_blank"}.

idColumn (implicit schema only, entities only)

: Optional: The heading of the column representing DCIDs of custom entities that the importer should create. If you don't specify this, the importer will auto-generate DCIDs for each row in the file. It is strongly recommended that you use specify this to define your own DCIDs.
rowEntityType (entities only)

entityColumns (implicit schema only, entities only)
: Required for CSV files containing custom entities: The DCID of the entity type (new or existing) of the custom entities you are importing. For example, if you are importing a set of hospital entities, the entity type could be the existing entity type [`Hospital`](https://datacommons.org/browser/Hospital){: target="_blank"}.

: Optional: A list of headings of columns that represent existing DCIDs in the knowledge graph. The heading must be the DCID of the entity type of the column (e.g. `City`, `Country`) and each row must be the DCID of the entity (e.g. `country/CAN`, `country/PAN`).

columnMappings (explicit schema only)
columnMappings

: Optional: If headings in the observations CSV file do not use the required names for these columns (`variable`, `entity`, etc.), provide the equivalent names for each column. For example, if your headings are `SERIES`, `GEOGRAPHY`, `TIME_PERIOD`, `OBS_VALUE`, you would specify:
```
Expand All @@ -183,96 +127,7 @@ columnMappings (explicit schema only)
"value": "OBS_VALUE"
```

{: #observation-properties}
observationProperties (implicit schema only)

: Optional: Additional information about each observation contained in the CSV file. Whatever setting(s) you specify will apply to all observations in the file.

Currently, the following properties are supported:
- [`unit`](/glossary.html#unit): The unit of measurement used in the observations. This is a string representing a currency, area, weight, volume, etc. For example, `SquareFoot`, `USD`, `Barrel`, etc.
- [`observationPeriod`](/glossary.html#observation-period): The period of time in which the observations were recorded. This must be in ISO duration format, namely `P[0-9][Y|M|D|h|m|s]`. For example, `P1Y` is 1 year, `P3M` is 3 months, `P3h` is 3 hours.
- [`measurementMethod`](/glossary.html#measurement-method): The method used to gather the observations. This can be a random string or an existing DCID of [`MeasurementMethodEnum`](https://datacommons.org/browser/MeasurementMethodEnum){: target="_blank"} type; for example, `EDA_Estimate` or `WorldBankEstimate`.
- [`scalingFactor`](/glossary.html#scaling-factor): An integer representing the denominator used in measurements involving ratios or percentages. For example, for percentages, the denominator would be `100`.

Note that you cannot mix different property values in a single CSV file. If you have observations using different properties, you must put them in separate CSV files.

## Entities (implicit schema only)

This is required for custom entity imports. Whether you are referencing an existing entity type or a creating a new entity type, specify its DCID here. Note that it must match the DCID specified in the input files `rowEntityType` field.

### Entity parameters

name

: If you are creating a new entity type, provide a human-readable name for it. If you are referencing an existing entity type, omit this parameter.

description

: If you are creating a new entity type, provide a longer description for it. If you are referencing an existing entity type, omit this parameter.

## Variables (implicit schema only)

The `variables` section is optional. You can use it to define names and associate additional properties with the statistical variables in the files, using the parameters described below. All parameters are optional. If you don't provide this section, the importer will automatically derive the variable names from the CSV file headings.

### Variable parameters {#varparams}

name

: The display name of the variable, which will show up throughout the UI. If not specified, the column name is used as the display name.
The name should be concise and precise; that is, the shortest possible name that allow humans to uniquely identify a given variable. The name is used to generate NL embeddings.

description

: A long-form description of the variable.

{: #varprops}
properties

: Additional Data Commons properties associated with this variable. The properties are any property required or optional in the [MCF Node definition](custom_data.md#mcf) of a variable. The value of the property must be a DCID.

Each property is specified as a key:value pair. Here are some examples:

```json
{
"populationType": "schema:Person",
"measuredProperty": "age",
"statType": "medianValue",
"gender": "Female"
}
```

Note that the `measuredProperty` property has an effect on the display: if it is not set for any variable, the importer assumes that it is different for every defined variable, so that each variable will be shown in a different chart in the UI tools. If you would like multiple variables to show up in the same chart, be sure to set this property on all of the relevant variables, to the same (DCID) value. For example, if you wanted `Adult_curr_cig_smokers_female` and `Adult_curr_cig_smokers_male` to appear on the same Timeline chart, set `measuredProperty` to a common property of the two variables, for example [`percent`](https://datacommons.org/browser/percent){: target="_blank"}.

```json
"variables": {
"Adult_curr_cig_smokers": {
"properties": {
"measuredProperty": "percent"
}
},
"Adult_curr_cig_smokers_female": {
"properties": {
"measuredProperty": "percent"
}
}
}
```

group

: By default, the Statistical Variables Explorer will display all custom variables as a group called "Custom Variables". You can use this option to create one or more custom group names and assign different variables to groups. The value of the `group` option is used as the heading of the group. For example, in the sample data, the group name `OECD` is used to group together the two variables from the two CSV files:

![group_screenshot](/assets/images/custom_dc/customdc_screenshot5.png){: width="400"}

You can have a multi-level group hierarchy by using `/` as a separator between each group.

> Note: You can only assign a variable to one group. If you would like to assign the same variable to multiple groups, you will need to define the groups as nodes in MCF; see [Define a statistical variable group node](custom_data.md#statvar-group) for details.

searchDescriptions

: An array of descriptions to be used for creating more NL embeddings for the variable. This is only needed if the variable `name` is not sufficient for generating embeddings.

## groupStatVarsByProperty (explicit schema only)
## groupStatVarsByProperty

Optional: When set to `true`, causes the Statistical Variable Explorer to display a top-level category called "Custom Variables", and groups together variables with the same population types and measured properties. For example:

Expand Down
Loading
Loading