Code for the paper, "LLMs as Proxy Survey Participants With RAG", by Elias Torjani, Airidas Brikas, and Daniel Hardt (our BSc thesis)
Check out our [abstract-length] paper on it: Market research via persona-induced Large Language Models, or see our poster below as a TL;DR

- Export your chat messages from Facebook, Instagram, and/or WhatsApp (instructions below)
- Take the surveys to constitute target responses, for the LLMs proxying you in the same surveys.
- Clone this repository
- Download Ollama and your models of choice, to run inference locally.
- Any cloud provider is discouraged to mitigate leakage-risk of sensitive information.
[!EXPORT]
How to get chat messages from Facebook, Instagram, and/or WhatsApp
This is a relatively manual process, and Meta will take about a week.
- Facebook incl. Instagram --> Account settings --> Download your information --> Download or transfer information --> pick account[s] (incl. Instagram) --> Specific types of information --> choose "Messages" (Get "All time", and in JSON format)
- WhatsApp --> Settings --> Chats --> Export chat --> pick your 1-on-1 chats to export "Without media"
- Optional: Use Beeper's API to continuosly export new messages, but be aware of our experiment is a snapshot in time.
Note
This is a forked repository from this one, where our original commit history is preserved. We tried to make this repository to make our experiments easier to reproduce, so feel free to use it on your own data.