You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Content**: A manually updated list of every organization, department, and agency with their associated names mapped to a single numeric ID (`org_var.csv`). Long-form program names from the 2018-2023 service inventory mapped to program IDs from Departmental Plans and Results Reports (`serv_prog.csv`).
71
+
-**Content**: A manually updated list of every organization, department, and agency with their associated names mapped to a single numeric ID (`org_var.csv`). Long-form program names from the 2018-2023 service inventory mapped to program IDs from Departmental Plans and Results Reports (`serv_prog.csv`), and the service ID number registry which links the service ID number to its assigned department or agency (`sid_registry.csv`).
72
72
-**Update Frequency**: Ad-hoc
73
73
74
74
---
@@ -80,7 +80,7 @@ The [Policy on Service and Digital](https://www.tbs-sct.canada.ca/pol/doc-eng.as
80
80
- All CSV files use a semicolon (`;`) as a delimiter between columns.
81
81
82
82
### Timestamps
83
-
- All CSV files produced by the script include a timestamp on the last row of the file.
83
+
- All CSV files produced by the script include a timestamp on the last row of the file. This timestamp also flags if the underlying data comes from a snapshot.
84
84
85
85
### Accessing files remotely
86
86
- To access the files in the latest release, point your tool to the following url: `https://github.com/gcperformance/service-data/releases/latest/download/XXX.csv`, replacing xxx.csv with the file you want to access, for example `si.csv`
@@ -135,14 +135,10 @@ For a more detailed description of each file and field, please consult [README_i
135
135
136
136
---
137
137
## Other files
138
-
-`main.py` - Orchestrates the processing pipeline.
138
+
-`main.py` - Python file that orchestrates the processing pipeline.
139
139
-`requirements.txt` - Lists python dependencies.
140
-
-`context.md` - Context on this dataset for use with LLM.
141
-
-`database.dbml` - **Draft** schema defining a database model.
142
-
-`tidy-script` - Bash script producing file paths for deleting inputs, outputs, caches, etc.
143
140
-`README_indicators.md` - Detailed information about datasets produced by script
144
141
145
-
146
142
### Python script files (src/)
147
143
-`clean.py`: functions to clean and set up data
148
144
-`comp.py`: functions compare files against one another, for example for snapshots
@@ -155,18 +151,20 @@ For a more detailed description of each file and field, please consult [README_i
155
151
-`utils.py`: misc utility functions, produces some files for `outputs/utils/` directory
156
152
157
153
### Tests (tests/)
158
-
159
154
-`README.md`: placeholder readme documentation for tests
160
155
-`conftest.py`: configuration file for pytest
161
156
-`test_merge.py`: testing script for merge.py
162
157
-`test_outputs.py`: testing script for output files
163
158
-`generate_reference.py`: script for generating field names and types for all output files, see ref/ directory
164
159
-`reference_fields.csv`: Table of all tables, fields, and datatypes for use with test script
165
160
166
-
### Github workflows (.github/workflows)
161
+
### Jupyter Notebooks (notebooks/)
162
+
- These notebooks are for testing and developing new features
167
163
164
+
### Github workflows (.github/workflows)
168
165
-`generate-files.yml`: Github actions script that produces releases on a given schedule or on an ad-hoc basis.
0 commit comments