Skip to content

Database

Charles Aracil edited this page Aug 21, 2025 · 9 revisions

UML extract of database (as of 25/05/21)

Database Schema - Offseason Shelter for Science

Data model

For readability purpose, only the columns that define the entities can be found below in the model.


rescue_db_entity_relationships_diagram

Tables

asset_kinds

Description

An asset kind is the extension of the file (aka. asset). For example, it can be txt, csv among other values.

Columns

Name Description PK Nullable
id Auto-incremented identifier of the asset kind.
name Name of the file (aka. asset) extension like pdf or doc.
created_at Date and time when the asset kind was created in the table.
updated_at Date and time when the asset kind was updated for the last time in the table.

asset_resource

Description

This is an intermediary table that represents the many-to-many relationship between assets and resources. Use this table to join them.

Columns

Name Description PK Nullable
id Auto-incremented identifier asset-resource relationship.
asset_id Identifier of the asset in the relationship. Use this column to join with the assets on assets.id.
resource_id Identifier of the resource in the relationship. Use this column to join with the resources on resources.id.
created_at Date and time when the asset-resource relationship was created in the table.
updated_at Date and time when the asset-resource relationship was updated for the last time in the table.

assets

Description

An asset is a file we want to rescue, it is included in a dataset and reachable through one or several resources. It is located at an URL, it can be either core data, sample or documentation and must be downloadable.

Columns

Name Description PK Nullable
id Auto-incremented identifier of the asset.
url URL where the asset can be found and downloaded.
kind_id ID of the asset kind. Use this column to join with the asset_kinds on asset_kinds.id.
created_at Date and time when the asset was created in the table.
updated_at Date and time when the asset was updated for the last time in the table.
size Size of the asset in octets.
mtime Date and time when the asset was updated for the last time at the URL.

dataset_rankings

Description

Rankings of datasets. The detail of theses rankings can be found in dataset_ranks.

Columns

Name Description PK Nullable
id Auto-incremented identifier of the dataset's ranking.
name Name of the dataset's ranking.
comment Comment about the dataset's ranking.
ranking_date Date and time when the dataset's ranking was made.
type Type of the dataset's ranking.
created_at Date and time when the dataset's ranking was created in the table.
updated_at Date and time when the dataset's ranking was updated for the last time in the table.

dataset_ranks

Description

Dataset ranking details. Name of associated rankings can be found in dataset_rankings table.

Columns

Name Description PK Nullable
id Auto-incremented identifier of the dataset's rank.
dataset_id ID of the dataset that was ranked. Use this column to join with the datasets on datasets.id.
ranking_id ID of the dataset's ranking that was made. Use this column to join with the dataset_rankings on dataset_rankings.id.
rank Rank of the dataset.
event_count ADD DOCS FOR COLUMN.
created_at Date and time when the dataset's rank was created in the table.
updated_at Date and time when the dataset's rank was updated for the last time in the table.

datasets

Description

A dataset is a parcel of data - for example, it could be the crime statistics for a region, the spending figures for a government department, or temperature readings from various weather stations. A dataset is owned by an organization and contains two things: information or metadata about the data like the title, the publisher or the date and a number of resources that contain the data like files or links to other web pages.

Columns

Name Description PK Nullable
id Auto-incremented identifier of the dataset.
dg_id ID of the dataset within data.gov.
dg_name Name of the dataset in data.gov.
dg_title Title of the dataset in data.gov.
dg_notes Notes that describe the dataset in data.gov.
dg_metadata_created Date and time when the dataset's metadata was created in data.gov.
dg_metadata_modified Date and time when the dataset's metadata was updated for the last time in data.gov.
access_direct_dl_count ADD DOCS FOR COLUMN.
access_total_count ADD DOCS FOR COLUMN.
organization_id ID of the organization. Use this column to join with the organizations on organizations.id.
created_at Date and time when the dataset was created in the table.
updated_at Date and time when the dataset was updated for the last time in the table.

datasets_json

Description

Raw data of datasets extracted from the CKAN API. It contains all the data related to the organization that owns the dataset and related to the dataset's resources. The tables organizations, datasets and resources are built from this table.

Columns

Name Description PK Nullable
id Auto-incremented identifier of the dataset.
dataset_id It is a duplicate of id. This column should be removed.
content Dataset's data structured in a JSON format, dataset's metadata, resources' and organization's data can be found here.
created_at Date and time when the dataset was created in the table.
updated_at Date and time when the dataset was updated for the last time in the table.

organizations

Description

In the context of rescuing data from data.gov, an organization can be a federal agency like the NOAA (National Oceanic and Atmospheric Administration, Department of Commerce), a state like the state of California or a city like the city of Austin. An organization owns datasets.

Columns

Name Description PK Nullable
id Auto-incremented identifier of the organization.
dg_id ID of the organization within data.gov.
dg_name Code of the organization in data.gov.
dg_title Full name of the organization in data.gov.
dg_created Date and time when the organization was created in data.gov.
created_at Date and time when the organization was created in the table.
updated_at Date and time when the organization was updated for the last time in the table.

resources

Description

A resource is contained in a dataset and holds its data itself. It can be of various formats such as CSV or Excel spreadsheet, XML file, PDF document, image file, linked data in RDF format or even a page that is elsewhere on the web. When a resource is a file to be rescued, it is also an asset.

Columns

Name Description PK Nullable
id Auto-incremented identifier of the resource.
dg_id ID of the resource within data.gov.
dg_name Name of the resource in data.gov.
dg_description Description of the resource in data.gov.
dg_resource_locator_function ADD DOCS FOR COLUMN.
dg_resource_locator_protocol ADD DOCS FOR COLUMN.
dg_mimetype ADD DOCS FOR COLUMN.
dg_state ADD DOCS FOR COLUMN.
dg_created Date and time when the resource was created in data.gov.
dg_metadata_modified Date and time when the resource's metadata was modified for the last time in data.gov.
dg_url URL provided in data.gov where the resource can be found.
resource_type Type of the resource that is the file extension if the resource is a file or a custom type if it is a web page.
dataset_id ID of the dataset in which the resource is included. Use this column to join with the datasets on datasets.id.
created_at Date and time when the resource was created in the table.
updated_at Date and time when the resource was updated for the last time in the table.