Skip to content

AlignmentResearch/deception-evasion-honesty

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository hosts the code for the paper Preference Learning with Lie Detectors can Induce Honesty or Evasion.

An example of a setup and a basic experimental run is given in run.sh. Different run configurations can be adjusted by setting the flags such as DO_DPO to true or false. The codebase has been tested on the pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel Docker image.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published