You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-5Lines changed: 12 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,9 +7,9 @@ This Python project processes Government of Canada service-related data, merging
7
7
8
8
### Key Features
9
9
-**Data ingestion**: Downloads and processes service inventory and service standard performance data.
10
-
-**Dataset Merging**: Combines service inventory data from 2018-2023 historical datasets and 2024+ datasets from the Open Government Portal.
11
-
-**Quality Assurance**: Identifies and flags inconsistencies in datasets.
12
-
-**Output Generation**: Produces structured CSVs that reflect the latest information on the Open Government Portal.
10
+
-**Dataset merging**: Combines service inventory data from 2018-2023 historical datasets and 2024+ datasets from the Open Government Portal.
11
+
-**Quality assurance**: Identifies and flags inconsistencies in datasets.
12
+
-**Output generation**: Produces structured CSVs that reflect the latest information on the Open Government Portal.
13
13
14
14
Service inventory and service standard performance data are collected as a requirement under the [Policy on Service and Digital](https://www.tbs-sct.canada.ca/pol/doc-eng.aspx?id=32603).
-**Content**: A manually updated list of every organization, department, and agency with their associated names mapped to a single numeric ID (`org_var.csv`). Long-form program names from the 2018-2023 service inventory mapped to program IDs from Departmental Plans and Results Reports (`serv_prog.csv`).
70
72
-**Update Frequency**: Ad-hoc
71
73
@@ -138,6 +140,7 @@ For a more detailed description of each file and field, please consult [README_i
138
140
-`context.md` - Context on this dataset for use with LLM.
139
141
-`database.dbml` - **Draft** schema defining a database model.
140
142
-`tidy-script` - Bash script producing file paths for deleting inputs, outputs, caches, etc.
143
+
-`README_indicators.md` - Detailed information about datasets produced by script
141
144
142
145
143
146
### Python script files (src/)
@@ -160,6 +163,10 @@ For a more detailed description of each file and field, please consult [README_i
160
163
-`generate_reference.py`: script for generating field names and types for all output files, see ref/ directory
161
164
-`reference_fields.csv`: Table of all tables, fields, and datatypes for use with test script
162
165
166
+
### Github workflows (.github/workflows)
167
+
168
+
-`generate-files.yml`: Github actions script that produces releases on a given schedule or on an ad-hoc basis.
169
+
163
170
---
164
171
165
172
### Release Schedule
@@ -179,7 +186,7 @@ For a more detailed description of each file and field, please consult [README_i
179
186
180
187
---
181
188
## Directory structure for project
182
-
*Given that files produced by the script are made available in releases, all transitory input and output files are no longer tracked with git, or included in the repo. Releases have a flat structure, so the directory structure below is only relevant if you clone the repo and run the script.*
189
+
*Given that files produced by the script are made available in releases, all transitory input and output files are no longer tracked with git, or included in the repo. The exception is the input snapshots, which are a part of the repo. Releases have a flat structure, so the directory structure below is only relevant if you clone the repo and run the script.*
Copy file name to clipboardExpand all lines: README_indicators.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -474,7 +474,7 @@ Unique list of service IDs with latest reporting year and department. Generated
474
474
-`service_scope_ext_or_ent`: Calculated field that indicates whether the service is external or internal enterprise to assist in quick filtering of relevant services. Refers to reported value from latest fiscal year
475
475
476
476
#### `si_all.csv`
477
-
Full service inventory merging 2018–2023 datasets with the 2024 dataset. All `service_scope` included. See list of fields for `si.csv`. Generated by `src/merge.py/merge_si`.
477
+
Full service inventory merging 2018–2023 datasets with the 2024 dataset. All `service_scope` included, not just `EXTERN` and `ENTERPRISE`. See list of fields for `si.csv`. Generated by `src/merge.py/merge_si`.
478
478
479
479
#### `ss_all.csv`
480
-
Full service standard dataset merging 2018–2023 datasets with the 2024 dataset. All `service_scope` included. See list of fields for `ss.csv`. Generated by `src/merge.py/merge_ss`.
480
+
Full service standard dataset merging 2018–2023 datasets with the 2024 dataset. All `service_scope` included, not just `EXTERN` and `ENTERPRISE`. See list of fields for `ss.csv`. Generated by `src/merge.py/merge_ss`.
0 commit comments