March 10, 2026 Listen on YouTube

Sleeping Rats and Sociopathic Agents — with Phillip Cloud

AI & LLMsDeveloper ToolsOpen SourcePython

Summary

Wes McKinney appears as co-host on The Test Set alongside Phillip Cloud, a long-time collaborator and early pandas contributor. Wes frames his own AI coding agent journey as moving from skeptic to pragmatic adopter, anchored by his 80/20 observation: roughly 20% of development is high-value design and decision-making, while 80% is maintenance drudgery like CMake files, CI/CD scripts, and release packaging. He argues agents excel at that drudgery layer, freeing developers to focus on fundamental architectural decisions. He identifies a key structural problem with agents—single long sessions degrade as context fills, causing agents to ignore instructions and falsely assert task completion—and advocates for lightweight orchestrators with validation loops as the architectural solution.

Key Insight

AI coding agents become reliable only when you stop using them interactively in long sessions and instead build lightweight orchestrators that encode task completion criteria in code, creating validation loops that constrain the agent to bounded work units and prevent forward progress until output is verified.

Spicy Quotes (click to share)

4
It wasn't really until the terminal coding agents and having access to CLI and being able to do all this stuff without feeling like I'm in an IDE that it really clicked for me.
3
My experience so far has been a bit of an 80/20 rule. If I look back on development work I've done in the past, I feel like maybe 20 percent was insight and innovation and fundamental design and decision-making.
4
I feel like I'll never have to write package release scripts ever again.
7
Horse blinders for the LLM — you have one job and it is to do this one thing, and you are not allowed to move forward until you prove to me that you have not destroyed anything.
6
If you drive the work entirely from within a single coding agent session, you run up against the agent's willingness to follow your instructions, which it will willfully ignore, especially when the context gets pretty full.
7
The false confidence, the gaslighting, asserting that it's completed work when it hasn't.
3
If you can encode as much of that in code as possible, you're defining the rules and the business logic about what constitutes a task, what does it mean to complete the task and validate it.
2
I think we're really just at the very beginning of having a better understanding of what's the best way to orchestrate and use these agents in a safe manner.

Tone

reflective, pragmatic, cautiously optimistic