|
| 1 | +# ARGOS Synthetic Hotel Optimization Datasets — Data Descriptor |
| 2 | + |
| 3 | +## Background & Summary |
| 4 | + |
| 5 | +The ARGOS (Adaptive Recursive Gradient Optimization System) framework integrates |
| 6 | +Lexicographic Constraint Optimization (LCO) with a Componentwise Approximated |
| 7 | +Gradient (CAG) filter to enable stable, lexicographically safe optimization in |
| 8 | +hierarchical decision-making problems. To support reproducibility and to |
| 9 | +facilitate independent evaluation of ARGOS, we release a suite of fully |
| 10 | +synthetic datasets that emulate hotel and multi-unit management scenarios under |
| 11 | +varied operating conditions. |
| 12 | + |
| 13 | +The datasets capture key dimensions of hotel operations, including occupancy, |
| 14 | +staffing levels, staff fatigue, and revenue per available room (RevPAR), as well |
| 15 | +as higher-level constructs such as scenario volatility, resource constraints, and |
| 16 | +multi-property traffic patterns. No real hotel operational data or personal data |
| 17 | +are used: all records are synthesized from a controlled stochastic simulator. |
| 18 | + |
| 19 | +These datasets are intended as a reproducible testbed for: |
| 20 | + |
| 21 | +- Lexicographic optimization under strict Tier-1 “feasibility” constraints, |
| 22 | +- Evaluation of componentwise gradient filtering (CAG), |
| 23 | +- Benchmarking of ARGOS against baseline optimization methods, and |
| 24 | +- Future extensions to QUBO/quantum-hybrid hotel management formulations. |
| 25 | + |
| 26 | +## Methods |
| 27 | + |
| 28 | +### Simulation Framework |
| 29 | + |
| 30 | +All datasets are generated using a stylized CMDP-like hotel environment |
| 31 | +implemented within the ARGOS codebase. The environment describes a single hotel |
| 32 | +(or a collection of hotels) through a state vector including normalized |
| 33 | +occupancy, staff level, staff fatigue index, and pricing/revenue variables. |
| 34 | + |
| 35 | +At each simulated time step, the environment evolves according to: |
| 36 | + |
| 37 | +- deterministic dynamics capturing baseline demand and staffing trends, |
| 38 | +- stochastic noise terms representing unmodeled variability, |
| 39 | +- scenario-specific modifications (e.g., increased volatility or reduced staff). |
| 40 | + |
| 41 | +ARGOS and baseline controllers produce candidate actions (price adjustments, |
| 42 | +staffing decisions, or control signals), which are mapped to state transitions. |
| 43 | +For the released datasets, the underlying policy is fixed and the primary focus |
| 44 | +is on the resulting trajectories rather than policy optimization itself. |
| 45 | + |
| 46 | +### Scenarios |
| 47 | + |
| 48 | +We provide several distinct scenario families: |
| 49 | + |
| 50 | +- **Long-horizon baseline:** 365-day single-hotel operation under moderate |
| 51 | + noise, used to study stability and convergence. |
| 52 | + |
| 53 | +- **High-volatility scenario:** 180-day simulation with amplified noise on |
| 54 | + occupancy, fatigue, and RevPAR to test robustness under non-stationary, |
| 55 | + high-variance conditions. |
| 56 | + |
| 57 | +- **Staff-shortage scenario:** 180-day simulation where staff levels are |
| 58 | + systematically reduced and fatigue is typically elevated, stressing Tier-1 |
| 59 | + feasibility (minimum staff) and Tier-3 “staff well-being” priorities. |
| 60 | + |
| 61 | +- **Multi-unit traffic:** 100-day simulation of booking traffic across multiple |
| 62 | + hotel units, approximating heterogeneous demand across properties. |
| 63 | + |
| 64 | +- **Hyperparameter sweep:** summary statistics across varying step sizes and |
| 65 | + CAG weighting coefficients, illustrating the sensitivity of performance and |
| 66 | + Tier-1 violations to hyperparameter choices. |
| 67 | + |
| 68 | +- **QUBO example:** a small random QUBO matrix for demonstration of |
| 69 | + binary-optimization interfaces; no direct trajectory data is associated. |
| 70 | + |
| 71 | +## Data Records |
| 72 | + |
| 73 | +All datasets are provided as CSV files in the `data/` directory of the ARGOS |
| 74 | +repository (and mirrored in the Zenodo deposition). The main files are: |
| 75 | + |
| 76 | +1. `synthetic_long_horizon.csv` — 365 daily records of a single-hotel |
| 77 | + environment with columns: `day`, `occupancy`, `fatigue`, `staff_level`, |
| 78 | + `revpar`. |
| 79 | + |
| 80 | +2. `scenario_high_volatility.csv` — 180 daily records under increased |
| 81 | + volatility with columns: `day`, `occupancy`, `fatigue`, `revpar`. |
| 82 | + |
| 83 | +3. `scenario_staff_shortage.csv` — 180 daily records under staff-shortage |
| 84 | + stress with columns: `day`, `occupancy`, `fatigue`, `staff_level`, `revpar`. |
| 85 | + |
| 86 | +4. `hyperparam_sweep_results.csv` — summary statistics for combinations of |
| 87 | + step size `alpha` and `cag_weight`, with columns: `alpha`, `cag_weight`, |
| 88 | + `avg_revpar`, `violations_tier1`, `fatigue_mean`. |
| 89 | + |
| 90 | +5. `qubo_example_matrix.csv` — an 8×8 QUBO coefficient matrix, stored in |
| 91 | + wide form with each row representing one dimension of the binary quadratic |
| 92 | + form. |
| 93 | + |
| 94 | +6. `multiunit_traffic_sim.csv` — 100-day multi-unit booking traffic simulation |
| 95 | + with columns: `day`, `hotel_0_traffic`–`hotel_4_traffic`. |
| 96 | + |
| 97 | +Each file is accompanied by a data dictionary (see below), describing the |
| 98 | +semantic meaning, type, and range of each column. |
| 99 | + |
| 100 | +## Technical Validation |
| 101 | + |
| 102 | +Because the data are synthetic, validation focuses on internal consistency and |
| 103 | +plausibility rather than on comparison with an external ground truth. |
| 104 | + |
| 105 | +- **Internal consistency:** the simulator enforces reasonable bounds on |
| 106 | + occupancy (0–1), staff levels (0–1), and fatigue indices (0–1). RevPAR values |
| 107 | + follow plausible distributions for mid-tier hotels but do not reproduce any |
| 108 | + specific operator’s financials. |
| 109 | + |
| 110 | +- **Scenario behavior:** high-volatility scenarios show visibly increased |
| 111 | + variance in occupancy and revenue; staff-shortage scenarios show lower average |
| 112 | + staff levels and generally higher fatigue. These patterns were inspected |
| 113 | + visually via time-series plots and summary statistics. |
| 114 | + |
| 115 | +- **Hyperparameter sensitivity:** the hyperparameter sweep dataset is generated |
| 116 | + using repeated runs with fixed seeds, ensuring stable comparisons between |
| 117 | + configurations while still reflecting stochastic variability within runs. |
| 118 | + |
| 119 | +- **Reproducibility:** the Python scripts and notebooks used to generate these |
| 120 | + datasets are included in the ARGOS repository, enabling full regeneration |
| 121 | + under controlled seeds. |
| 122 | + |
| 123 | +## Usage Notes |
| 124 | + |
| 125 | +The datasets are designed for: |
| 126 | + |
| 127 | +- Reproducing the experiments reported in the ARGOS paper, |
| 128 | +- Extending the analysis with additional baselines or ablations, |
| 129 | +- Serving as controlled environments for studying lexicographic / hierarchical |
| 130 | + optimization techniques. |
| 131 | + |
| 132 | +Users should note that the datasets: |
| 133 | + |
| 134 | +- Do not represent any specific real hotel or chain, |
| 135 | +- Should not be used for financial forecasting or business decisions, |
| 136 | +- May be adapted or extended by modifying the ARGOS simulation code and |
| 137 | + regenerating trajectories with different parameter choices. |
| 138 | + |
| 139 | +When using these datasets in publications or derived work, please cite the ARGOS |
| 140 | +paper and (optionally) the Zenodo dataset DOI. |
0 commit comments