Skip to content

Key progress goals for turning the auto-interventions side of the work into a paper #25

@QuintinPope

Description

@QuintinPope
  • Decide how best to present case studies in the paper

  • Specifically: what metrics to offer? How to validate those metrics are meaningful? How to summarize many different validated hypotheses? Which hypotheses to include in the appendix?

  • How do you measure hypothesis diversity? (implement some method and add it as a metric)

  • Try cross-cluster testing for validated hypotheses (i.e., take hypothesis generated for cluster i and test it on the texts from cluster j)

  • Decide which additional data to use (jailbreaking prompts I guess)

  • Add those additional data to our experimental configuration options

  • Run any additional case studies

  • Revisit finetuning facts recall? (low priority)

  • Continuously add the above to the paper as we do them

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions