Create an initial dashboard that provides a predtimechart-based forecast visualization component for the hub. The thinking is that this will allow us to get something practical into the hands of hubverse users relatively quickly.
Considerations:
- Client side only, i.e., nothing server-side. This will greatly simplify new hub onboarding.
- Others: @todo
A python application to create predtimechart JSON files is available from this repository and can be installed in a fresh python environment via pip:
pip install git+https://github.com/hubverse-org/hub-dashboard-predtimechartThe application can then be run from the command line:
hub_predtimechart --helpThe major parts of this project are:
- Forecast visualization component: To visualize forecast data, we will generalize the predtimechart JavaScript component to work with Hubverse hubs. Details:
- specific component changes: @todo
- Visualization data files: This project will configure predtimechart to load its data from
.jsonfiles that will be generated from hub forecast files, an approach similar to how viz.covid19forecasthub.org works (GitHub repo). This requires us to write a program (we will use Python) to generate those.jsonfiles, like the R files here. The.jsonfiles will be stored in the AWS S3 bucket for each hub, akin to how hubverse-transform saves its generated.parquetfiles to S3. Our initial constraints:output_type: To start we will only support hubs that containquantileforecasts (please see Output types in the docs).intervals: @todo
- Predtimechart configuration: Predtimechart is configured via a JavaScript options object that specifies settings like
available_as_ofs,task_ids,models, etc. Our current thinking is that this object will be generated from hub configuration files.- generation details (reference_date -> as_of/selected date, horizon, target_date: x axis, task id vars -> dropdowns, ...): @todo
- Server/Dashboard: We will write a simple dashboard page providing a link to the forecast visualization (predtimechart) page. Our initial thought is to implement this via a straighforward S3 static website (i.e., a self-contained
index.htmlfile, perhaps with some JavaScript to access basic hubverse admin information to orient the viewer such as hub name, tasks summary, etc.) Two comparable sites are https://respicast.ecdc.europa.eu/ (especially) and https://covid19forecasthub.org/ . See [Dashboard architecture] below for details.
Initially the visualization will have these limitations:
- Only one round block in
tasks.json > roundscan be plotted. - Only one
model_tasksgroup within that round block can be plotted, and onlymodel_tasksgroups withquantileoutput_types will be considered. - The following quantile levels (
output_type_ids) are present in the data: 0.025, 0.25, 0.5, 0.75, 0.975 - The hub has
reference_date|origin_dateandtarget_date|target_end_datetask IDs intasks.json > rounds > model_tasks > task_ids. - Model metadata must contain a boolean
designated_modelfield. - The
target_metadatalist in the specifiedmodel_tasksobject within the specifiedroundsobject must contain exactly one object, which must have a single key in thetarget_keysobject. - Only forecast data will be plotted, not target data.
- We assume all hub files have been validated.
- For the
task_idsentry in predtimechart config option generation, we usevaluefor bothvalueandtext, rather than asking the user to provide a mapping fromvaluetotext. A solution is to require that mapping inpredtimechart-config.yml. - The
initial_as_ofandcurrent_dateconfig fields are the last ofhub_config.fetch_reference_dates. - The
initial_task_idsconfig field is the firsttask_idsvalue. - Target data generation: The app
generate_target_json_files.pyis limited to hubs that store their target data as a .csv file in thetarget-datasubdirectory. That file is specified via thetarget_data_file_namefield in the hub'spredtimechart-config.ymlfile. We expect the file has these columns:date,value, andlocation.
Some visualization-related information must be configured for each hub, including:
- which interval levels to show. initially: None, 50%, 95%
- which round block in
tasks.jsonto use - reference_date column name
- target_date column name
- name of boolean field for model inclusion. initially we will assume it is
designated_model - names of hub models - to be listed first
initial_checked_models(a predtimechart option)- others: @todo
Our initial thinking is an approach where we provide a fixed layout (e.g., a menubar at top and a content area in the middle, such as found at https://respicast.ecdc.europa.eu/ ) that allows limited customization specified by convention via markdown files (some with specific names) placed in directories with specific names. Details:
- Configurable content is specified via markdown files located in a directory named
hub-website(say) in the root hub directory. - The site layout is a single column (100% width) with two rows: A menubar/header at the top, and a content area taking up the rest of the vertical space.
- The menubar contains these items (from left to right): Home (brand image/text), "Forecasts", "Evaluations", "Background", "Community", "Get in touch".
- The content area depends on the selected menu item:
- Home: Content is loaded from
hub-website/home.md. - "Forecasts": Content is the predtimechart visualization.
- "Evaluations": @todo
- "Background", "Community", "Get in touch": @todo loaded from specific files under
hub-websitesuch asbackground.md, etc.
- Home: Content is loaded from
We plan to primarily use https://github.com/hubverse-org/example-complex-forecast-hub for development unit tests.
- How/when will file generation be triggered? This applies to both
.jsonvisualization files and the predtimechart configuration object. For example, and admin UI, GitHub Action on schedule, round close, etc. - Is this a good time to remove predtimechart's user ensemble, if desired?
- Is this an opportunity to set up some kind of general purpose notification service for interested parties (e.g., hub admins) that informs them when, say, the viz is configured or updated, viz data files are updated, etc.?
- Dashboard: Do we want to allow users to add menu items that link to pages with content loaded from .md files? For example, should we support a
hub-website/menuswhere users can put files that become menu items with the file name (capitalized, say) and content generated from the file. - Generation/scheduling: We will need a flag to indicate whether we want to regenerate forecast json files for all past weeks, or only for the present week.
- Where is the source data coming from - GitHub vs. S3?
- Which model output formats will we support? The hub docs mention CSV and parquet, but that others (e.g., zipped files) might be supported.
- Regarding naming the .json files, should we be influenced by Arrow's partitioning scheme where it names intermediate directories according to filtering.
- We might need separate apps to update config options vs. visualization data (json files) for the case where the user has changed
predtimechart-config.ymlindependent of a round closing. - Should we filter out
hub_config.horizon_col_name == 0? - Should
forecast_data_for_model_df()'squantile_levelsbe stored in a config file somewhere?
Use the following to create a local dev setup using pyenv and pipenv, which we assume are already installed.
$ cd <this repo>
$ pyenv versions # you should see this repo's .python-version set
$ pipenv --python $(pyenv which python)$ pipenv install pip-tools # for `pip-compile`
$ pipenv run pip-compile --extra=dev --output-file=requirements/requirements.txt pyproject.toml$ cd <this repo>
$ pipenv install -r requirements/requirements.txt -e .$ cd <this repo>
$ pipenv run python -m pytest
$ pipenv run python src/hub_predtimechart/app.py