|
| 1 | +--- |
| 2 | +title: Some notes and observations from using llms |
| 3 | +date: 2025-08-25 |
| 4 | +page.meta.tags: programming, practices, llm, machine-learning, "ai" |
| 5 | +page.meta.categories: programming |
| 6 | +--- |
| 7 | + |
| 8 | +I've been using some form of LLM code assist since GitHub Co-Pilot entered beta. |
| 9 | +Over the years that's included various GPT, Gemini and Claude versions along with |
| 10 | +trying different editor and CLI based assistants. How I integrate and use these |
| 11 | +tools is something that continues to change for personal and professional projects, |
| 12 | +but I thought it might be time to write down a few thoughts. |
| 13 | + |
| 14 | +- This one should be obvious, but in case it's not you need to try different |
| 15 | + models with different projects and teams to find which is the most efficient |
| 16 | + in that context. Right now I've have found Claude Sonnet 4 to be the model |
| 17 | + that provides the code I prefer to maintain, but it helps to have evaluation |
| 18 | + criteria that you use to check the results of each new model release, and for |
| 19 | + testing models across different vendors (Anthropic, OpenAI, Google, etc). |
| 20 | +- Fundamentally these tools can change how you work if you let them, but: |
| 21 | + - It's particularly important when using these tools to consider how different |
| 22 | + "modes" (agent, agent + mcp, ask, etc) trade off agency and how that impacts |
| 23 | + yourself, and your team. |
| 24 | + - For many of us code is only part of the project, and sometimes not even the |
| 25 | + hardest part, but consider that the LLM is more than happy to generate pages |
| 26 | + of code. That can be a burden to maintain, and if operating in agent mode |
| 27 | + I will wager when the agent runs off and implements a full feature that you |
| 28 | + don't know it the same way you would if you used the LLM to provide suggestions/ |
| 29 | + snippets and or you implemented it yourself. There is a cost there, short |
| 30 | + and long term. |
| 31 | + - There is a larger team and [organization](https://news.ycombinator.com/item?id=44972151) |
| 32 | + conversation going on right now that you need to engage in. These tools can |
| 33 | + boost and harm moral, and strong teams are foundational to success short |
| 34 | + and long term. |
| 35 | +- I go back and forth between how I use agents depending on the structure of my |
| 36 | + week. Some weeks I'm spending a lot of time pairing, meeting or doing task that |
| 37 | + don't provide hours at a time to be head down focused on the code in a flow state. |
| 38 | + When that happens I'm happy to have tools like LLM agents that I can prompt and |
| 39 | + check back on, but "with great power comes great responsibility". Don't check |
| 40 | + in that code without doing a thorough review on your end first (including a very |
| 41 | + detailed review of any generated tests). Your team will thank you for it, and |
| 42 | + you won't find people (like myself) leaving frustrated review comments about |
| 43 | + generated nonsensical code. |
| 44 | + - LLMs are very verbose, tame this with your prompts if you expect to thoroughly |
| 45 | + review and integrate their output. |
| 46 | +- Be careful when allowing LLMs to generate tests. I've heard so many people talk |
| 47 | + about how great it is they can use these tools to write test for code that doesn't |
| 48 | + have any, or to have it write test with the new feature they are implementing. |
| 49 | + That's not wrong, LLMs can generate test, but just like building maintainable |
| 50 | + systems is a skill writing good test is also a skill. I would wager that the |
| 51 | + code training datasets are not curated to projects such as [AOSA](https://aosabook.org/en/index.html). |
| 52 | + We would probably disappointed to see where most projects sit on a quality curve |
| 53 | + for training data sets. I can personally share that time and again LLMs will |
| 54 | + generate test that erroneously pass. They will remove assertions, write assertions |
| 55 | + that never fail, or write test that never call the function or DUT. Yes, you |
| 56 | + can have an LLM generate good test, but it takes work, and be cautious, our test |
| 57 | + are one of our strongest signal points in a project. Failing to keep their |
| 58 | + integrity strong will only lead to bugs, and likely create a code base that |
| 59 | + nobody wants to be responsible for. |
| 60 | +- If using these tools in a team setting engage with them as a team. Share prompts, |
| 61 | + agree on a set of minimal instructions. Make sure everybody understands context |
| 62 | + and default prompts. Set guidelines and expectations for how the team uses |
| 63 | + the tools and what quality markers are expected to be present and maintained. |
| 64 | + |
| 65 | +There are hundreds of blog post out there about LLMs. If I had to sum up my thoughts: |
| 66 | +The latest LLM generation is impressive, and can transform how we work, but they |
| 67 | +can also change the level of agency that individuals exhibit. Use the tool, but |
| 68 | +don't turn off your brain, and [recognize](https://www.researchgate.net/profile/Tamera-Schneider-2/publication/334344580_The_Measurement_of_the_Propensity_to_Trust_Automation/links/5e501f76a6fdcc2f8f552ba8/The-Measurement-of-the-Propensity-to-Trust-Automation.pdf) |
| 69 | +our bias to [trust](https://www.tandfonline.com/doi/epdf/10.1080/10447318.2024.2307691?needAccess=true) |
| 70 | +these tools. |
0 commit comments