Applying Psychometrics to Large Language Model Simulated Populations: Recreating the HEXACO Personality Inventory Experiment with Generative Agents
Generative agents powered by Large Language Models demonstrate human-like characteristics through sophisticated natural language interactions. Their ability to assume roles and personalities based on predefined character biographies has positioned them as cost-effective substitutes for human participants in social science research. This paper explores the validity of such persona-based agents in representing human populations; we recreate the HEXACO personality inventory experiment by surveying 310 GPT-4 powered agents, conducting factor analysis on their responses, and comparing these results to the original findings presented by Ashton, Lee, & Goldberg in 2004. Our results found 1) a coherent and reliable personality structure was recoverable from the agents' responses demonstrating partial alignment to the HEXACO framework. 2) the derived personality dimensions were consistent and reliable within GPT-4, when coupled with a sufficiently curated population, and 3) cross-model analysis revealed variability in personality profiling, suggesting model-specific biases and limitations. We discuss the practical considerations and challenges encountered during the experiment. This study contributes to the ongoing discourse on the potential benefits and limitations of using generative agents in social science research and provides useful guidance on designing consistent and representative agent personas to maximise coverage and representation of human personality traits.
Public repository for data and code associated with our paper: arXiv, by Sarah Mercer.
Lexical Analysis:
The following notebooks contain code to perform PCA and present the resulting factors. Additionally, cronbach alpha's, jaccard coefficients (with original hexaco findings) are presented. Alongside, semantic similiarity score for the terms within in factor, and the factors against hexaco dimensions.
- PopCensus 5-Factor Solution
- PopCensus 6-Factor Solution
- PopCensus 10-Factor Solution (including Fig. 4 and Fig. 5).
- PopProfessional 8-Factor Solution (including Fig. 12).
HEXACO-PI-R 100:
PopCensus:
-
data/pop_census.json - contains all the character biographies for PopCensus. Use data_support.fix_name() to translate 'Full Name' to index in results and ipsatised_results files.
-
data/popc_responses_file*.csv - batches of agent responses (under 50MB each).
-
data/popc_results.csv - contains scores from hexaco lexical analysis survey. Columns are adjectives, indexed by names (via fix_name()). *
-
data/popc_ipsatised_results.csv - contains ipsatised scores from survey. Columns are adjectives, indexed by names (via fix_name()). *
PopProfessional:
-
data/pop_professional.json - contains all the character biographies for PopProfessional. Use data_support.fix_name() to translate 'Full Name' to index in results and ipsatised_results files.
-
data/popp_results.csv - contains scores from hexaco lexical analysis survey, for PopProfessional. Columns are adjectives, indexed by names (via fix_name()). *
-
data/popp_ipsatised_results.csv - contains ipsatised scores from survey, for PopProfessional. Columns are adjectives, indexed by names (via fix_name()). *
(*) can be generated from agent responses, using data/data_prep.ipynb.
Lexical Analysis Support:
- data/adjectives.json - a list of all the adjectives from the survey.
HEXACO-PI-R 100:
-
data/hexaco-pi-r-responses-gpt4.csv - responses from HEXACO-PI-R 100 survey (PopCensus), generated with GPT-4.
-
data/hexaco-pi-r-responses-llama.csv - responses from HEXACO-PI-R 100 survey (PopCensus), generated with Llama-3.2 (3Bn, via Ollama).
Note: Llama refused to give a response for question 98 ('It wouldn't bother me to harm someone I didn't like.') 85 times (response=='[content-filtered]'). This question is part of the Altruism facet, which is not included in 6 main dimensions. -
data/hexaco-pi-r-responses-sonnet.csv - responses from HEXACO-PI-R 100 survey (PopCensus), generated with Claude Sonnet 3.7.
-
data/hexaco-pi-r-responses-phi4.csv - responses from HEXACO-PI-R 100 survey (PopCensus), generated with Microsoft's Phi4 (14Bn, via Ollama).
-
Figure 1, Figure 11 - population broken down by OSC2020 Occupation codes, and plotted against census data (England & Wales 2020).
-
Figure 2 - Scree plot of unrotated eigenvalues (pop census).
-
Figure 3 - Heatmap of Cronbach's alphas for each factor in all solutions.
-
Figure 6 - Heatmap of correlation between agents' hexaco scores (derived from lexical analysis using PopC's 10 factor solution loadings) and their PIR-100 results. Also Table 1, correlation scores for all models: Llama, GPT4, Sonnet and Phi.
-
Figure 13 - Heatmap of correlation between PopCensus' 10 factors and PopProfessional's 8 factors.
-
Figure 14 - plot of agent consistency vs biography length (for PopCensus).
Python packages pandas and gensim are required to run these notebooks.
PCA is conducted using 'R', create a _private.py file in the support subdirectory that contains the following definition:
r_binary_folder = '[your path to]/bin/Rscript'
Ensure 'psych' and 'readr' are installed in your R environment:
install.packages("tidyverse")
install.packages("psych")
This project expects to find the FastText model (cc.en.300.vec) in a subfolder called 'model_data'.
Data and Code released under MIT License. Please cite our research if you use either in your work.
Copyright (c) 2023-2025 The Alan Turing Institute.