Skip to content

Jivnesh/CAPE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Official code for the paper "CAPE: Context-Aware Personality Evaluation Framework for Large Language Models" accepted in EMNLP25 (Findings). If you use this code please cite our paper.

Requirements

Some of the important software dependecies are as follows:

  • Python 3.9.21
  • anthropic 0.49.0
  • google-generativeai 0.8.3
  • openai 1.60.2
  • scikit-learn 1.6.1

We assume that you have installed conda beforehand. Use the following commands to replicate our environments using openai.yml. You can find the details of dependecies in these files. Make sure you export your openai key to the environment using export OPENAI_API_KEY=YOUR_KEY.

conda env create -f openai.yml
conda activate openai
cd codes

Execute all the codes from ./codes path.

How to apply the CAPE framework?

Using the followng script, you can apply CAPE on any llm using guidelines. Make sure you have your API key for the respective llm provider.

python run_CAPE.py --model_name gpt-3.5-turbo  --Experiment_name _stability_  --run_id 1 --test_temperature 0.0 --Activate_full_context true --shuffle_questions 1 --instruction_setting 1 --option_ordering_setting 1 --option_wording_setting 1 --response_sensitivity_setting 1 --item_paraphrasing_setting 1

Using the following flags, you can generate results for all the combinations reported in the Table 1. To activate our CAPE framework, use --Activate_full_context true. To get more idea about other flags, refer to code or Appendix D in the paper.

model_name : gpt-3.5-turbo gpt-4-turbo gemini-1.5-flash claude-3-5-haiku-20241022 llama-3.1-8b-instant meta-llama/Llama-3.3-70B-Instruct-Turbo-Free meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Experiment_name : _stability_ _temperature_ _option_wording_ _option_ordering_ _item_paraphrase_
run_id : 1 2 3
test_temperature : 0.5 1.0 1.5
Activate_full_context : true false
shuffle_questions : 1 2 3
instruction_setting : 1 2 3
option_ordering_setting : 1 2 3
option_wording_setting : 1 2 3
item_paraphrasing_setting : 1 2 3

How to reproduce the results in Table 1?

We have stored all assessment reports in Result section. The following script uses these stored reports to evaluate the results. Please note that if you choose to regenerate the reports using run_CAPE.py you may not get the exact numbers reported in the Table 1 due to various factors such as stochastic nature of LLMs, exact version of the llm etc. However, you will find the similar trend of results reported in Table 1.

bash CAPE_results.sh

How to produce analysis reported in Section6?

Refer to CAPE_Analysis_Results.ipynb file.

How to apply CAPE on RPAs

You need to create a new conda enviroment using chatharushi.yml file. We have used the Chatharushi library in our implementation. Activate this new environment to assess all 32 characters' persona with/without our framework. Each RPA is ran 3 times to evaluate consistency.

conda env create -f chatharushi.yml
conda activate chatharushi
bash RPA_tests.sh

How to get results for RPAs

The personality assessment tests of the RPAs are stored in the Results/Character-BFI folder. The following script produces the overall scores reported in the Table 2 for the respective GPT version. Please note that evaluation script takes little time due to GPR taking time to learn the function.

python RPA_Results.py --With_CAPE 0 --GPT "3.5"

Citations

@misc{sandhan2025capecontextawarepersonalityevaluation,
      title={CAPE: Context-Aware Personality Evaluation Framework for Large Language Models}, 
      author={Jivnesh Sandhan and Fei Cheng and Tushar Sandhan and Yugo Murawaki},
      year={2025},
      eprint={2508.20385},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.20385},
      }

License

This project is licensed under the terms of the Apache license 2.0.

Acknowledgements

This work was supported by the “R&D Hub Aimed at Ensuring Transparency and Reliability of Generative AI Models” project of the Ministry of Education, Culture, Sports, Science and Technology.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages