Database

UML extract of database (as of 25/05/21)

Data model

For readability purpose, only the columns that define the entities can be found below in the model.

Tables

`asset_kinds`

Description

An asset kind is the extension of the file (aka. asset). For example, it can be txt, csv among other values.

Columns

Name	Description	PK
id	Auto-incremented identifier of the asset kind.	✅
name	Name of the file (aka. asset) extension like `pdf` or `doc`.
created_at	Date and time when the asset kind was created in the table.
updated_at	Date and time when the asset kind was updated for the last time in the table.

`asset_resource`

Description

This is an intermediary table that represents the many-to-many relationship between assets and resources. Use this table to join them.

Columns

Name	Description	PK
id	Auto-incremented identifier asset-resource relationship.	✅
asset_id	Identifier of the asset in the relationship. Use this column to join with the `assets` on `assets.id`.
resource_id	Identifier of the resource in the relationship. Use this column to join with the `resources` on `resources.id`.
created_at	Date and time when the asset-resource relationship was created in the table.
updated_at	Date and time when the asset-resource relationship was updated for the last time in the table.

`assets`

Description

An asset is a file we want to rescue, it is included in a dataset and reachable through one or several resources. It is located at an URL, it can be either core data, sample or documentation and must be downloadable.

Columns

Name	Description	PK	Nullable
id	Auto-incremented identifier of the asset.	✅
url	URL where the asset can be found and downloaded.
kind_id	ID of the asset kind. Use this column to join with the `asset_kinds` on `asset_kinds.id`.
created_at	Date and time when the asset was created in the table.
updated_at	Date and time when the asset was updated for the last time in the table.
size	Size of the asset in octets.		✅
mtime	Date and time when the asset was updated for the last time at the URL.		✅

`dataset_rankings`

Description

Rankings of datasets. The detail of theses rankings can be found in dataset_ranks.

Columns

Name	Description	PK
id	Auto-incremented identifier of the dataset's ranking.	✅
name	Name of the dataset's ranking.
comment	Comment about the dataset's ranking.
ranking_date	Date and time when the dataset's ranking was made.
type	Type of the dataset's ranking.
created_at	Date and time when the dataset's ranking was created in the table.
updated_at	Date and time when the dataset's ranking was updated for the last time in the table.

`dataset_ranks`

Description

Dataset ranking details. Name of associated rankings can be found in dataset_rankings table.

Columns

Name	Description	PK
id	Auto-incremented identifier of the dataset's rank.	✅
dataset_id	ID of the dataset that was ranked. Use this column to join with the `datasets` on `datasets.id`.
ranking_id	ID of the dataset's ranking that was made. Use this column to join with the `dataset_rankings` on `dataset_rankings.id`.
rank	Rank of the dataset.
event_count	ADD DOCS FOR COLUMN.
created_at	Date and time when the dataset's rank was created in the table.
updated_at	Date and time when the dataset's rank was updated for the last time in the table.

`datasets`

Description

A dataset is a parcel of data - for example, it could be the crime statistics for a region, the spending figures for a government department, or temperature readings from various weather stations. A dataset is owned by an organization and contains two things: information or metadata about the data like the title, the publisher or the date and a number of resources that contain the data like files or links to other web pages.

Columns

Name	Description	PK
id	Auto-incremented identifier of the dataset.	✅
dg_id	ID of the dataset within data.gov.
dg_name	Name of the dataset in data.gov.
dg_title	Title of the dataset in data.gov.
dg_notes	Notes that describe the dataset in data.gov.
dg_metadata_created	Date and time when the dataset's metadata was created in data.gov.
dg_metadata_modified	Date and time when the dataset's metadata was updated for the last time in data.gov.
access_direct_dl_count	ADD DOCS FOR COLUMN.
access_total_count	ADD DOCS FOR COLUMN.
organization_id	ID of the organization. Use this column to join with the `organizations` on `organizations.id`.
created_at	Date and time when the dataset was created in the table.
updated_at	Date and time when the dataset was updated for the last time in the table.

`datasets_json`

Description

Raw data of datasets extracted from the CKAN API. It contains all the data related to the organization that owns the dataset and related to the dataset's resources. The tables organizations, datasets and resources are built from this table.

Columns

Name	Description	PK
id	Auto-incremented identifier of the dataset.	✅
dataset_id	It is a duplicate of `id`. This column should be removed.
content	Dataset's data structured in a JSON format, dataset's metadata, resources' and organization's data can be found here.
created_at	Date and time when the dataset was created in the table.
updated_at	Date and time when the dataset was updated for the last time in the table.

`organizations`

Description

In the context of rescuing data from data.gov, an organization can be a federal agency like the NOAA (National Oceanic and Atmospheric Administration, Department of Commerce), a state like the state of California or a city like the city of Austin. An organization owns datasets.

Columns

Name	Description	PK
id	Auto-incremented identifier of the organization.	✅
dg_id	ID of the organization within data.gov.
dg_name	Code of the organization in data.gov.
dg_title	Full name of the organization in data.gov.
dg_created	Date and time when the organization was created in data.gov.
created_at	Date and time when the organization was created in the table.
updated_at	Date and time when the organization was updated for the last time in the table.

`resources`

Description

A resource is contained in a dataset and holds its data itself. It can be of various formats such as CSV or Excel spreadsheet, XML file, PDF document, image file, linked data in RDF format or even a page that is elsewhere on the web. When a resource is a file to be rescued, it is also an asset.

Columns

Name	Description	PK	Nullable
id	Auto-incremented identifier of the resource.	✅
dg_id	ID of the resource within data.gov.
dg_name	Name of the resource in data.gov.
dg_description	Description of the resource in data.gov.
dg_resource_locator_function	ADD DOCS FOR COLUMN.		✅
dg_resource_locator_protocol	ADD DOCS FOR COLUMN.		✅
dg_mimetype	ADD DOCS FOR COLUMN.		✅
dg_state	ADD DOCS FOR COLUMN.
dg_created	Date and time when the resource was created in data.gov.
dg_metadata_modified	Date and time when the resource's metadata was modified for the last time in data.gov.
dg_url	URL provided in data.gov where the resource can be found.
resource_type	Type of the resource that is the file extension if the resource is a file or a custom type if it is a web page.
dataset_id	ID of the dataset in which the resource is included. Use this column to join with the `datasets` on `datasets.id`.
created_at	Date and time when the resource was created in the table.
updated_at	Date and time when the resource was updated for the last time in the table.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database

UML extract of database (as of 25/05/21)

Data model

Tables

`asset_kinds`

`asset_resource`

`assets`

`dataset_rankings`

`dataset_ranks`

`datasets`

`datasets_json`

`organizations`

`resources`

Uh oh!

Clone this wiki locally