|
| 1 | +# Meeting minutes of the project team |
| 2 | + |
| 3 | +This document summarizes the meetings of the workstream |
| 4 | + |
| 5 | +## 2024-07-24 - kick-off |
| 6 | + |
| 7 | +### Agenda |
| 8 | + |
| 9 | +- Introductions |
| 10 | +- Discussions on problem statement and possible solutions |
| 11 | +- Ideas on how to go forward |
| 12 | +- Wrap up and immediate next steps |
| 13 | + |
| 14 | +### Notes |
| 15 | + |
| 16 | +- Group met with context that reproducibility is poor in price statistics research and we should improve it. The group then decided how to tackle this problem and how to properly scope the objectives to make things achievable. Main challenges currently faced in the discipline included: |
| 17 | + - Reserach is becoming increasingly empirical, hence we need processes to work with data and code to make the research more easily reproducible |
| 18 | + - There are no agreed upon benchmark datasets *per se* in the discipline to test methods on |
| 19 | + - The discipline will not own most of the datasets as most research is done on either confidential internal datasets owned by NSOs or on proprietary (purchased datasets). The open datasets that exist are not organized or cohesively documented/available. |
| 20 | + - The reason **why** reproducibility is important is not as inherent/widely communicated as it could be, hence any attempts to solve the technical and processes aspects needs to include this aspect in the communication. |
| 21 | +- The group touched on goals to solve these aspects and on processes that can be set up to incentivize reproducibility, including by lowering the complexity and creating an easy 'on ramp' to making projects more reproducible (including by cataloging open datasets and showcasing how code can be made reproducible), coordinating with the two bi-annual confernece to recommend reproducibility be part of the paper submission process, embedding metadata standards into the data availbile to make it easier and more standardized when various benchmark datasets are used to evaluate a specific method, etc. |
| 22 | +- The data catalogue for open datasets was seen as a major deliverable to the discipline, and one that needed to be phased. In other words we can set up a 'proof of concept' or interim catalogue in a simple way to demonstrate the use case, and later transition to a fuller and more comprehensive catalogue with more resources and infrastructure, potentially hosted by the UN Global Platform. Showcasing the interim solution and broadening the adoption to beyond just price statistics would help make this business case. |
| 23 | +- The outcome of the discussion resolved to target two main deliverables: developing the proof of concept data catalogue, and writing guidance (for instance in the form of a white paper) on how to make projects more reproducible. This target scope was later summarized [through our project charter](https://github.com/UN-Task-Team-for-Scanner-Data/reproducibility-project/blob/main/project-charter.md). |
| 24 | + |
| 25 | +## 2025-01-28 |
| 26 | + |
| 27 | +### Agenda |
| 28 | + |
| 29 | +- Intro and discussion on timelines and scope. Confirm cadence of meetings |
| 30 | +- Quick discussion on how to track PM activity. We could use something like GitHub, the UN wiki, or some other option |
| 31 | +- Discussion on Objective A. We have two options ([.Stat suite](https://de-demo.siscc.org/vis?lc=en&df%5bds%5d=staging%3ASIS-CC-stable&df%5bid%5d=DSD_SCANNER_DATA%40DF_SEASONAL_PRODUCTS&df%5bag%5d=LU1&df%5bvs%5d=1.0&av=true&dq=..M.&lom=LASTNPERIODS&lo=36&to%5bTIME_PERIOD%5d=false&ly%5bcl%5d=TIME_PERIOD&ly%5brs%5d=MEASURE%2CUNIT&ly%5brw%5d=PRODUCT&vw=ov "https://de-demo.siscc.org/vis?lc=en&df%5bds%5d=staging%3ASIS-CC-stable&df%5bid%5d=DSD_SCANNER_DATA%40DF_SEASONAL_PRODUCTS&df%5bag%5d=LU1&df%5bvs%5d=1.0&av=true&dq=..M.&lom=LASTNPERIODS&lo=36&to%5bTIME_PERIOD%5d=false&ly%5bcl%5d=TIME_PERIOD&ly%5brs%5d=MEASURE%2CUNIT&ly%5brw%5d=PRODUCT&vw=ov") and [data contract catalogue](https://sergegoussev.github.io/data_contract_experiments/ "https://sergegoussev.github.io/data_contract_experiments/")) for interim data catalogue. |
| 32 | +- Discussion on Objective B. Can we adopt any best practices from [the Turning Way](https://book.the-turing-way.org/ "https://book.the-turing-way.org/") and their [template repo](https://github.com/the-turing-way/reproducible-project-template "https://github.com/the-turing-way/reproducible-project-template") for price statistics? |
| 33 | +- Roundtable |
| 34 | + |
| 35 | +### Notes |
| 36 | + |
| 37 | +- Group discussed objectives for the 2025 CPI Expert Group meeting. It was decided to focus on an interim data catalogue and provide interim guidance at the conference, with the fuller guidance to be developed over the next year. |
| 38 | +- GitHub projects was agreed upon as the structure for PM activity |
| 39 | +- The draft data catalogue idea was given the green light to flush out further as our likely implementation of the interim version. Metadata strandards should be implemented but it would be hard to use a platform where we don't own the dataset (such as dominik's data). |
| 40 | +- Guidelines for how to develop reproducible research through git was seen as a good way, with the idea that the guidance we would produce would (a) provide the target state to aim for and (b) summarize maturity levels ([similar to RAP maturity levels](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)) that showcase how researchers can start easily and progress over time. |
| 41 | + |
| 42 | +## 2025-02-11 |
| 43 | + |
| 44 | +### Agenda |
| 45 | + |
| 46 | +- Discussion on the [proposed project charter for the project](https://github.com/UN-Task-Team-for-Scanner-Data/reproducibility-project/blob/main/project-charter.md) (since merged into main) |
| 47 | +- Update on the 2 repositories for the project ([project repo](https://github.com/UN-Task-Team-for-Scanner-Data/reproducibility-project/blob/outline-project-structure-and-purpose/project-charter.md) and [interim data catalogue repo](https://github.com/UN-Task-Team-for-Scanner-Data/price-stats-data-catalogue)) and [the mocked up project management approach](https://github.com/orgs/UN-Task-Team-for-Scanner-Data/projects/1/views/1) |
| 48 | +- Roundtable |
| 49 | + |
| 50 | +### Notes |
| 51 | + |
| 52 | +- Team discussed scope for 2025 CPI EG presentation. The [project plan outlined two milestones](https://github.com/orgs/UN-Task-Team-for-Scanner-Data/projects/1/views/1), one on data catalogue and one on guidance will be the target. The project structure and use of the GitHub project was summarized. |
| 53 | +- The use of the two repos was discussed. The [reproducibility-project repo](https://github.com/UN-Task-Team-for-Scanner-Data/reproducibility-project) will host the guidance we develop in whatever format we decide (ideas shared including using quarto to [write reproducible papers](https://github.com/UN-Task-Team-for-Scanner-Data/reproducibility-project), or a [quarto static site](https://quarto.org/docs/websites/) if we will more tend towards creating guidance). [price-stats-data-catalogue](https://github.com/UN-Task-Team-for-Scanner-Data/price-stats-data-catalogue) will host the interim open data catalogue. |
| 54 | +- Scope of the data contract was firmed up - we could aim to catalogue input datasets that are used to create some experimental indices and version the output datasets (that may be price indices or other artifacts) as part of the repository on github (such as by saving them in data folder and formatting them in [tidy data format](https://nhsdigital.github.io/rap-community-of-practice/implementing_RAP/workflow/tidy-data/)). |
| 55 | +- Use of tools like Zenodo was discussed and will be investigated to mint DOIs - [ticket #3](https://github.com/UN-Task-Team-for-Scanner-Data/reproducibility-project/issues/3) |
0 commit comments