Skip to content

Commit c82ca06

Browse files
committed
more updates to readme
1 parent 3992ed1 commit c82ca06

File tree

2 files changed

+55
-10
lines changed

2 files changed

+55
-10
lines changed

README.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ python main.py # Runs full processing pipeline
4747
## Datasets Consulted
4848

4949
### [GC Service Inventory and Service Performance](https://open.canada.ca/data/en/dataset/3ac0d080-6149-499a-8b06-7ce5f00ec56c)
50-
- **Files**: `si_2018.csv`, `si_2024.csv`, `ss_2018.csv`, `ss_2024.csv`
50+
- **Files**: `si_2018.csv`, `si_2024.csv`, `ss_2018.csv`, `ss_2024.csv`, `service_data_dict.json`
5151
- **Content**: Government of Canada service inventory, associated standards, and performance, along with relevant data dictionaries.
5252
- **Update Frequency**: Annually, typically in January
5353

@@ -68,7 +68,7 @@ python main.py # Runs full processing pipeline
6868

6969
### [Utilities developed for GC Service Inventory data analysis](https://github.com/gc-performance/utilities)
7070
- **Files**: `org_var.csv`, `serv_prog.csv`, `sid_registry.csv`
71-
- **Content**: A manually updated list of every organization, department, and agency with their associated names mapped to a single numeric ID (`org_var.csv`). Long-form program names from the 2018-2023 service inventory mapped to program IDs from Departmental Plans and Results Reports (`serv_prog.csv`).
71+
- **Content**: A manually updated list of every organization, department, and agency with their associated names mapped to a single numeric ID (`org_var.csv`). Long-form program names from the 2018-2023 service inventory mapped to program IDs from Departmental Plans and Results Reports (`serv_prog.csv`), and the service ID number registry which links the service ID number to its assigned department or agency (`sid_registry.csv`).
7272
- **Update Frequency**: Ad-hoc
7373

7474
---
@@ -80,7 +80,7 @@ The [Policy on Service and Digital](https://www.tbs-sct.canada.ca/pol/doc-eng.as
8080
- All CSV files use a semicolon (`;`) as a delimiter between columns.
8181

8282
### Timestamps
83-
- All CSV files produced by the script include a timestamp on the last row of the file.
83+
- All CSV files produced by the script include a timestamp on the last row of the file. This timestamp also flags if the underlying data comes from a snapshot.
8484

8585
### Accessing files remotely
8686
- To access the files in the latest release, point your tool to the following url: `https://github.com/gcperformance/service-data/releases/latest/download/XXX.csv`, replacing xxx.csv with the file you want to access, for example `si.csv`
@@ -135,14 +135,10 @@ For a more detailed description of each file and field, please consult [README_i
135135

136136
---
137137
## Other files
138-
- `main.py` - Orchestrates the processing pipeline.
138+
- `main.py` - Python file that orchestrates the processing pipeline.
139139
- `requirements.txt` - Lists python dependencies.
140-
- `context.md` - Context on this dataset for use with LLM.
141-
- `database.dbml` - **Draft** schema defining a database model.
142-
- `tidy-script` - Bash script producing file paths for deleting inputs, outputs, caches, etc.
143140
- `README_indicators.md` - Detailed information about datasets produced by script
144141

145-
146142
### Python script files (src/)
147143
- `clean.py`: functions to clean and set up data
148144
- `comp.py`: functions compare files against one another, for example for snapshots
@@ -155,18 +151,20 @@ For a more detailed description of each file and field, please consult [README_i
155151
- `utils.py`: misc utility functions, produces some files for `outputs/utils/` directory
156152

157153
### Tests (tests/)
158-
159154
- `README.md`: placeholder readme documentation for tests
160155
- `conftest.py`: configuration file for pytest
161156
- `test_merge.py`: testing script for merge.py
162157
- `test_outputs.py`: testing script for output files
163158
- `generate_reference.py`: script for generating field names and types for all output files, see ref/ directory
164159
- `reference_fields.csv`: Table of all tables, fields, and datatypes for use with test script
165160

166-
### Github workflows (.github/workflows)
161+
### Jupyter Notebooks (notebooks/)
162+
- These notebooks are for testing and developing new features
167163

164+
### Github workflows (.github/workflows)
168165
- `generate-files.yml`: Github actions script that produces releases on a given schedule or on an ad-hoc basis.
169166

167+
170168
---
171169

172170
### Release Schedule

0 commit comments

Comments
 (0)