Key progress goals for turning the auto-interventions side of the work into a paper

- [ ] Decide how best to present case studies in the paper
- [ ] Specifically: what metrics to offer? How to validate those metrics are meaningful? How to summarize many different validated hypotheses? Which hypotheses to include in the appendix?
- [ ] How do you measure hypothesis diversity? (implement some method and add it as a metric)
- [ ] Try cross-cluster testing for validated hypotheses (i.e., take hypothesis generated for cluster i and test it on the texts from cluster j)
- [ ] Decide which additional data to use (jailbreaking prompts I guess)
- [ ] Add those additional data to our experimental configuration options
- [ ] Run any additional case studies
- [ ] Revisit finetuning facts recall? (low priority)

- [ ] Continuously add the above to the paper as we do them

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key progress goals for turning the auto-interventions side of the work into a paper #25

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Key progress goals for turning the auto-interventions side of the work into a paper #25

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions