GitHub - AlignmentResearch/deception-evasion-honesty

This repository hosts the code for the paper Preference Learning with Lie Detectors can Induce Honesty or Evasion.

An example of a setup and a basic experimental run is given in run.sh. Different run configurations can be adjusted by setting the flags such as DO_DPO to true or false. The codebase has been tested on the pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel Docker image.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
configs		configs
data		data
solid_deception		solid_deception
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

AlignmentResearch/deception-evasion-honesty

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages