Skip to content

Commit ceb69a3

Browse files
committed
New post with some notes on using LLMs.
1 parent 00872b6 commit ceb69a3

File tree

1 file changed

+70
-0
lines changed

1 file changed

+70
-0
lines changed
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
title: Some notes and observations from using llms
3+
date: 2025-08-25
4+
page.meta.tags: programming, practices, llm, machine-learning, "ai"
5+
page.meta.categories: programming
6+
---
7+
8+
I've been using some form of LLM code assist since GitHub Co-Pilot entered beta.
9+
Over the years that's included various GPT, Gemini and Claude versions along with
10+
trying different editor and CLI based assistants. How I integrate and use these
11+
tools is something that continues to change for personal and professional projects,
12+
but I thought it might be time to write down a few thoughts.
13+
14+
- This one should be obvious, but in case it's not you need to try different
15+
models with different projects and teams to find which is the most efficient
16+
in that context. Right now I've have found Claude Sonnet 4 to be the model
17+
that provides the code I prefer to maintain, but it helps to have evaluation
18+
criteria that you use to check the results of each new model release, and for
19+
testing models across different vendors (Anthropic, OpenAI, Google, etc).
20+
- Fundamentally these tools can change how you work if you let them, but:
21+
- It's particularly important when using these tools to consider how different
22+
"modes" (agent, agent + mcp, ask, etc) trade off agency and how that impacts
23+
yourself, and your team.
24+
- For many of us code is only part of the project, and sometimes not even the
25+
hardest part, but consider that the LLM is more than happy to generate pages
26+
of code. That can be a burden to maintain, and if operating in agent mode
27+
I will wager when the agent runs off and implements a full feature that you
28+
don't know it the same way you would if you used the LLM to provide suggestions/
29+
snippets and or you implemented it yourself. There is a cost there, short
30+
and long term.
31+
- There is a larger team and [organization](https://news.ycombinator.com/item?id=44972151)
32+
conversation going on right now that you need to engage in. These tools can
33+
boost and harm moral, and strong teams are foundational to success short
34+
and long term.
35+
- I go back and forth between how I use agents depending on the structure of my
36+
week. Some weeks I'm spending a lot of time pairing, meeting or doing task that
37+
don't provide hours at a time to be head down focused on the code in a flow state.
38+
When that happens I'm happy to have tools like LLM agents that I can prompt and
39+
check back on, but "with great power comes great responsibility". Don't check
40+
in that code without doing a thorough review on your end first (including a very
41+
detailed review of any generated tests). Your team will thank you for it, and
42+
you won't find people (like myself) leaving frustrated review comments about
43+
generated nonsensical code.
44+
- LLMs are very verbose, tame this with your prompts if you expect to thoroughly
45+
review and integrate their output.
46+
- Be careful when allowing LLMs to generate tests. I've heard so many people talk
47+
about how great it is they can use these tools to write test for code that doesn't
48+
have any, or to have it write test with the new feature they are implementing.
49+
That's not wrong, LLMs can generate test, but just like building maintainable
50+
systems is a skill writing good test is also a skill. I would wager that the
51+
code training datasets are not curated to projects such as [AOSA](https://aosabook.org/en/index.html).
52+
We would probably disappointed to see where most projects sit on a quality curve
53+
for training data sets. I can personally share that time and again LLMs will
54+
generate test that erroneously pass. They will remove assertions, write assertions
55+
that never fail, or write test that never call the function or DUT. Yes, you
56+
can have an LLM generate good test, but it takes work, and be cautious, our test
57+
are one of our strongest signal points in a project. Failing to keep their
58+
integrity strong will only lead to bugs, and likely create a code base that
59+
nobody wants to be responsible for.
60+
- If using these tools in a team setting engage with them as a team. Share prompts,
61+
agree on a set of minimal instructions. Make sure everybody understands context
62+
and default prompts. Set guidelines and expectations for how the team uses
63+
the tools and what quality markers are expected to be present and maintained.
64+
65+
There are hundreds of blog post out there about LLMs. If I had to sum up my thoughts:
66+
The latest LLM generation is impressive, and can transform how we work, but they
67+
can also change the level of agency that individuals exhibit. Use the tool, but
68+
don't turn off your brain, and [recognize](https://www.researchgate.net/profile/Tamera-Schneider-2/publication/334344580_The_Measurement_of_the_Propensity_to_Trust_Automation/links/5e501f76a6fdcc2f8f552ba8/The-Measurement-of-the-Propensity-to-Trust-Automation.pdf)
69+
our bias to [trust](https://www.tandfonline.com/doi/epdf/10.1080/10447318.2024.2307691?needAccess=true)
70+
these tools.

0 commit comments

Comments
 (0)