Skip to content

Commit 6b32791

Browse files
committed
typo
1 parent ba065d1 commit 6b32791

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

content/posts/2025-09-15-playing-guess-who-with-an-llm.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ Either way, using a mix of proprietary hosting, [OpenRouter](https://openrouter.
7777
- Sonoma Dusk Alpha
7878
- Sonoma Sky Alpha
7979

80-
It may sound like a lot of work, but as you'll see in a minute, for many of them it didn't take long to form an opinion about their skill.
80+
It may sound like a lot of work, but as you'll see in a minute, for many of them it didn't take me long to form an opinion about their skill.
8181

8282
# The prompts
8383

@@ -148,7 +148,7 @@ How can a flagship model like _Claude Opus 4.1_ fail this way? I kept trying sev
148148

149149
# A systematic review
150150

151-
At this point I felt the duty to document this problem across all the models that had enough capabilities (vision + tool calling) to play this game. If I ever want an LLM personal assistant to handle my private data and to act on my behalf, I'd better make sure that they understand they can't give my passwords to the first kind thief that asks them.
151+
At this point I felt the duty to document this problem across all the models that had enough capabilities (vision + tool calling) to play this game. If I ever want an LLM personal assistant to handle my private data and to act on my behalf, I'd better make sure they understand that they can't just hand out my credentials to the first kind thief that asks them.
152152

153153
Here is a systematic review of the results, ordered roughly from worst to best. However, keep in mind that this is all based on a very small test sample, and although most models consistently fail the same way every time, there were some with a far more erratic behavior, looking very smart at times and incredibly dumb the next.
154154

0 commit comments

Comments
 (0)