This repo provides a minimal, educational prototype inspired by the paper:
X. Yang, S. He, K. G. Shin, M. Tabatabaie, and J. Dai,
"Cross-Modality and Equity-Aware Graph Pooling Fusion: A Bike Mobility Prediction Study,"
IEEE Transactions on Big Data (TBD), vol. 11, no. 1, 2025.
This project is not an official implementation of the paper.
It is a learning-oriented tribute to the methodological ideas proposed in GRAPE,
especially its themes of cross-modality graph fusion and equity-aware learning objectives.
- Two graphs (target=bike, auxiliary=taxi) on synthetic data
- Simple GCN encoders per modality
- Lightweight convolution-style fusion over modality × time
- Temporal modeling with a GRU
- Loss = MSE + λ_res · resource_fairness + λ_per · performance_fairness
- Reports MAE/RMSE and fairness gaps
- See grape_minimal/PipelineTest.ipynb
This prototype intentionally focuses on the conceptual structure.
The following components from the full GRAPE paper are not implemented:
- Real NYC/Chicago datasets and preprocessing pipelines
- Hierarchical differentiable pooling architecture
- Full spatial-temporal fusion design
- Equity-aware hyper-parameter tuning and ablations
- Complete experimental re-creation or performance benchmarking
This repo is built solely for academic self-study and to demonstrate my understanding of
the core ideas presented in the GRAPE framework.
All credit for the original design, problem formulation, and contributions
belongs to the authors of the paper cited above.
The grape_transfer module explores how the core ideas of GRAPE can be adapted
to multimodal sensing in Virtual Reality (VR) systems.
In my current work on VR-based neural engineering equipment, the headset provides rich multimodal streams such as eye-tracking data (gaze points + fixation time), hand tracking (gesture trajectories + finger joints), and head/pose motion. These modalities form heterogeneous but behaviorally correlated signals: for example, eye–hand coordination is well documented in both neuroscience and human–computer interaction. (Users typically look at the region where they are about to reach or point.)
This motivates a GRAPE-style formulation in which each modality is represented as a graph (e.g., gaze trajectory graph, hand-joint kinematic graph, head motion graph), and a cross-modality fusion module is used to extract the shared latent spatio-temporal patterns. One potential application is next-step gaze prediction: by learning how future gaze can be inferred from current hand movement and head pose, a VR system may pre-emptively allocate rendering resources to the most likely upcoming region of attention (e.g., foveated or prioritized rendering), thereby saving computation and reducing latency.
python grape_minimal.py