SensorLLM

Aligning Large Language Models with Motion Sensors for Human Activity Recognition

EMNLP 2025 Main Conference

Zechen Li¹ Shohreh Deldari¹ Linyao Chen² Hao Xue¹ Flora D. Salim¹

¹ University of New South Wales, Sydney
² University of Tokyo

🌟 Overview

SensorLLM is a two-stage framework that aligns sensor time series with human-intuitive text, enabling LLMs to interpret complex numerical data and achieve SOTA human activity recognition across varying sensor types, counts, and sequence lengths.

🔑 Key Features

Aligns sensor time-series with human-intuitive, annotation-free textual trend descriptions and summaries via a QA-based framework.
Sensor–Language Alignment Stage operates on single-channel, variable-length segments for fine-grained trend-text alignment.
Task-Aware Tuning Stage handles multi-channel, multi-sensor data for downstream human activity recognition (HAR).

📂 Datasets

The current implementation supports five HAR datasets: USC-HAD, UCI-HAR, MHealth, Capture-24, and PAMAP2.

To apply SensorLLM to other datasets, please refer to the code and configuration examples provided for the supported datasets. In particular, you may need to modify the corresponding entries in ts_backbone.yaml and adapt the data loading logic in the ./sensorllm/data folder to match your dataset’s format.

🚀 Getting started

Currently supported pretrained models:

Time-series models: Chronos

Language models: LLaMA

Other pretrained models can be used with minor modifications to the SensorLLM framework.

Sensor-Language QA Pairs Generation

We provide two example notebooks to generate QA pairs for aligning sensor time-series data with human-intuitive text:

mhealth_stage1.ipynb: Generates QA pairs for Stage 1 by aligning single-channel sensor segments with trend-based natural language descriptions.
mhealth_stage2.ipynb: Generates statistical information text for Stage 2, performing HAR classification using multi-channel sensor data.

You can also customize or extend the QA templates in these notebooks to generate more diverse types of sensor–language QA pairs for your own use cases.

Sensor–Language Alignment

To align sensor time-series data with text, run the following command:

torchrun --nproc_per_node=[NUM_GPUS] sensorllm/train/train_mem.py   \
--model_name_or_path [LLM_PATH] \
--pt_encoder_backbone_ckpt [TS_EMBEDDER_PATH]   \
--tokenize_method 'StanNormalizeUniformBins'    \
--dataset [DATASET_NAME] \
--data_path [TS_TRAIN_PATH]   \
--eval_data_path [TS_EVAL_PATH]   \
--qa_path [QA_TRAIN_PATH]   \
--eval_qa_path [QA_EVAL_PATH]   \
--output_dir [OUTPUT_PATH]    \
--model_max_length [MAX_LEN]    \
--num_train_epochs [EPOCH]    \
--per_device_train_batch_size [TRAIN_BATCH]    \
--per_device_eval_batch_size [EVAL_BATCH]    \
--evaluation_strategy "steps"    \
--save_strategy "steps"    \
--save_steps [SAVE_STEPS]    \
--eval_steps [EVAL_STEPS]    \
--learning_rate 2e-3   \
--weight_decay 0.0   \
--warmup_ratio 0.03   \
--lr_scheduler_type "cosine"   \
--logging_steps 1   \
--gradient_checkpointing True   \
--save_total_limit 1    \
--bf16 True    \
--fix_llm True   \
--fix_ts_encoder True   \
--model_type CasualLM   \
--load_best_model_at_end True

Evaluation or Inference

To perform evaluation or inference for the Sensor–Language Alignment stage, run the following command:

python sensorllm/eval/eval.py   \
--model_name_or_path [STAGE1_MODEL_PATH]  \
--pt_encoder_backbone_ckpt [TS_EMBEDDER_PATH]   \
--torch_dtype bfloat16	\
--tokenize_method 'StanNormalizeUniformBins'    \
--dataset [DATASET_NAME] \
--data_path [TS_DATASET_PATH]   \
--qa_path [QA_DATASET_PATH]  \
--output_file_name [OUTPUT_FILE_NAME]	\
--model_max_length [MAX_LEN]	\
--shuffle False

Task-Aware Tuning

To perform a HAR task, use the following command:

torchrun --nproc_per_node=[NUM_GPUS] sensorllm/train/train_mem.py   \
--model_name_or_path [STAGE1_MODEL_PATH] \
--pt_encoder_backbone_ckpt [TS_EMBEDDER_PATH]   \
--model_type "SequenceClassification" \
--num_labels [ACTIVITY_NUM]  \
--use_weighted_loss True  \
--tokenize_method 'StanNormalizeUniformBins'    \
--dataset [DATASET_NAME] \
--data_path [TS_TRAIN_PATH]   \
--eval_data_path [TS_EVAL_PATH]   \
--qa_path [QA_TRAIN_PATH]   \
--eval_qa_path [QA_EVAL_PATH]   \
--output_dir [OUTPUT_PATH]    \
--model_max_length [MAX_LEN]    \
--num_train_epochs [EPOCH]    \
--num_train_epochs [EPOCH]    \
--per_device_train_batch_size [TRAIN_BATCH]    \
--per_device_eval_batch_size [EVAL_BATCH]    \
--evaluation_strategy "steps"    \
--save_strategy "steps"    \
--save_steps [SAVE_STEPS]    \
--eval_steps [EVAL_STEPS]    \
--save_total_limit 1    \
--load_best_model_at_end True    \
--learning_rate 2e-3    \
--weight_decay 0.0    \
--warmup_ratio 0.03    \
--lr_scheduler_type "cosine"    \
--logging_steps 1    \
--bf16 True      \
--fix_llm True  \
--fix_cls_head False  \
--fix_ts_encoder True    \
--gradient_checkpointing True    \
--metric_for_best_model  "f1_macro" \
--preprocess_type "smry+Q" \
--greater_is_better True  \
--stage_2 True  \
--shuffle True

See ./sensorllm/data/utils.py for all available preprocess_type options or to make edits.

🌍 Citation

If you find this repository useful for your research, please cite our paper:

@inproceedings{li-etal-2025-sensorllm,
    title = "{S}ensor{LLM}: Aligning Large Language Models with Motion Sensors for Human Activity Recognition",
    author = "Li, Zechen  and
      Deldari, Shohreh  and
      Chen, Linyao  and
      Xue, Hao  and
      Salim, Flora D.",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    year = "2025",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.19/",
    pages = "354--379",
}

📄 License

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) .
The source code is released under the MIT License.

Please refer to the official repositories of any baseline methods included in this project for their respective license terms.

📩 Contact

If you have any questions or suggestions, feel free to contact Zechen at zechen.li(at)unsw(dot)edu(dot)au.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SensorLLM

Aligning Large Language Models with Motion Sensors for Human Activity Recognition

EMNLP 2025 Main Conference

🌟 Overview

🔑 Key Features

📂 Datasets

🚀 Getting started

Sensor-Language QA Pairs Generation

Sensor–Language Alignment

Evaluation or Inference

Task-Aware Tuning

🌍 Citation

📄 License

📩 Contact

FilesExpand file tree

readme.md

Latest commit

History

readme.md

File metadata and controls

SensorLLM

Aligning Large Language Models with Motion Sensors for Human Activity Recognition

EMNLP 2025 Main Conference

🌟 Overview

🔑 Key Features

📂 Datasets

🚀 Getting started

Sensor-Language QA Pairs Generation

Sensor–Language Alignment

Evaluation or Inference

Task-Aware Tuning

🌍 Citation

📄 License

📩 Contact