Skip to content

Commit f576a0e

Browse files
committed
linking patchscopes
1 parent 8c15dad commit f576a0e

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

personas/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
<meta name="description" content="Who's asking?">
88
<meta property="og:title" content="Who's asking?"/>
99
<meta property="og:description" content="Who's asking? User personas and the mechanics of latent misalignment"/>
10-
<meta property="og:url" content="https://pair-code.github.io/interpretability/patchscopes/"/>
10+
<meta property="og:url" content="https://pair-code.github.io/interpretability/personas/"/>
1111
<!-- Path to banner image, should be in the path listed below. Optimal dimenssions are 1200X630-->
1212
<meta property="og:image" content="static/image/method.png" />
1313
<meta property="og:image:width" content="1200"/>
@@ -181,7 +181,7 @@ <h2 class="subtitle is-size-3-tablet has-text-weight-bold has-text-centered has-
181181
<h3 class="subtitle is-size-4-tablet has-text-left pr-4 pl-4 pt-3 pb-3">
182182
<p>
183183
From a mechanistic perspective, we find that safeguards are layer-specific, and that decoding directly from earlier layers may bypass safeguards and recover misaligned content that would otherwise not have been generated. <br>
184-
We then use Patchscopes to analyze why certain user personas disable safeguards and find that they enable the model to form more charitable interpretations of otherwise dangerous queries.
184+
We then use <a href="https://pair-code.github.io/interpretability/patchscopes/" target="_blank">Patchscopes</a> to analyze why certain user personas disable safeguards and find that they enable the model to form more charitable interpretations of otherwise dangerous queries.
185185
</p>
186186
<p style="text-align:center;">
187187
<br>

0 commit comments

Comments
 (0)