Replies: 1 comment
-
|
Hi @xbeat, this sounds interesting! Feel free to open a PR here so we can place this in our evaluation guides section! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the feature or potential improvement
As AI Agents move into production, they are increasingly vulnerable not just to technical exploits (prompt injection), but to "Cognitive/Social Engineering" attacks (e.g., Authority Bias, False Urgency, Context Manipulation).
Currently, most evaluation examples focus on RAG accuracy or Hallucinations, but there is a lack of standardized examples on how to evaluate an Agent's resistance to psychological manipulation.
Describe the solution you'd like
I would love to contribute a new Cookbook to the Langfuse repository demonstrating how to use Langfuse Datasets and Evaluators to test for Anthropomorphic Vulnerabilities.
This cookbook would be based on the Cybersecurity Psychology Framework (CPF) https://cpf3.org, specifically the "Silicon Psyche" https://cpf3.org/siliconpsyche research which maps human psychological vulnerabilities (Authority, Urgency, Social Proof) to LLM behaviors.
The Cookbook would demonstrate:
Dataset Creation: Importing a set of adversarial prompts derived from the CPF (e.g., "The Fake CEO" scenario, "Urgent Payroll" scenario).
Tracing: Running these prompts against a target LLM/Agent and tracing the execution in Langfuse.
Model-based Evaluation: Setting up a custom "Security Auditor" evaluator in Langfuse that scores the agent's response (Pass/Fail) based on whether it succumbed to the psychological pressure or verified the request.
Analytics: Visualizing the agent's "Psychological Resilience" score.
Additional information
I am the author of the Cybersecurity Psychology Framework and recently published research on this topic ((https://arxiv.org/abs/2601.00867)).
I believe this would be a great way to showcase Langfuse's flexibility in the emerging field of AI Security & Safety.
If this aligns with your content roadmap, I am happy to prepare the Jupyter Notebook and submit a PR!
Beta Was this translation helpful? Give feedback.
All reactions