
There's a phrase I keep coming back to: working through the agent, not with it.
This distinction matters more than it first appears. And the pace of change in this space makes it genuinely difficult to know which developments are signal and which are noise.
This article covers multiple angles: practical guidance for individual practitioners, strategic framing for technical leaders, capability building for organisations, and a philosophical inversion that might change how you think about product design. Take what's useful for where you sit.
The evolution happened fast
AI-assisted coding moved through distinct phases in a remarkably short time:
Phase 1: Completions. Your IDE suggests the next line. This was the first "wow" moment. Useful, but fundamentally a faster autocomplete.
Phase 2: Agentic IDEs. Tools like Cursor added a "chat" layer. You talk to an agent, it talks to a model, changes appear in your files. More powerful, but still you directing traffic.
Phase 3: Fully agentic CLI tools. Aider arrived first, then Claude Code changed the dynamics entirely. The insight was obvious in hindsight: code is the universal interface to the digital world, and LLMs are text-native. Command lines are text-native. The most natural environment for these models isn't a graphical IDE with buttons and panels. It's a terminal.
This is where the leading edge is now. People getting transformative results work through agents like Claude Code, Codex CLI, or Gemini CLI. The agent does the engineering. The human does the steering, the critical thinking, the review.
The important caveat: even these CLI tools are scaffolding. I wrote previously about not building flying cars when the engine was just invented. The same principle applies here. CLI agents are the minimal frame that lets us feel the wind. They're not the final architecture. The infrastructure isn't built yet. We're still in Bertha Benz territory, buying fuel from pharmacies and using hatpins to unclog fuel lines.
What matters isn't picking the "right" tool and defending it. What matters is recognising what's becoming obsolete and pivoting before you've invested too heavily.
The economics shaped the landscape
Token access matters more than features.
Anthropic and OpenAI offer their $200 subscriptions. Users report spending what would have been $10,000 in API costs, covered by that flat fee. That's a serious return on investment if you know how to use it.
GitHub Copilot serves everyone for $20 a month. The economics are different. The inference provided is more constrained. As a result, Copilot users developed techniques to get results within their token allowance.
Spec-driven development was one of those techniques.
The promise of specs
The pitch was compelling: stop "vibe coding" (writing code without a plan), start with specifications. Define what you want in markdown. Let the AI implement it. The spec becomes the source of truth.
For people struggling with AI coding, this seemed like the answer. If the model kept going off the rails, maybe the problem was insufficient upfront thinking. Write better specs, get better results.
People addressed this problem in different ways. Some developed their own approaches intuitively. Some wrote about their methods. Some built tools like spec-kit with structured workflows and constitution files. I'm not taking a shot at any specific tool here: they were all reasonable responses to real problems people were experiencing. The pattern was widespread.
For some, these approaches genuinely worked. Domain experts who knew what they were doing, who tested their spec files, who iterated until things actually functioned: they got good results. Spend a few days crafting the perfect spec, validate it works, then others can build from that checkpoint.
Vibe specifying
Here's what happened for everyone else.
Spec-driven development allowed people to outsource their thinking even more. The sycophantic AI model would iterate on markdown files until it produced outputs that looked right. Beautiful architecture documents. Comprehensive requirement specs. Detailed implementation plans. Pages and pages of markdown that had never been tested against reality.
These untested specs got shared. Juniors picked them up with the promise of good results. They couldn't get anywhere. Why? Because the specs were hallucinated. Not the code: the specs themselves.
People used AI to create specs without understanding the domain. They'd go back and forth with ChatGPT, refining the language, adding sections, making it read professionally. Then they'd commit it to a repository and move on, never actually building anything from it.
The result was predictable. Context windows polluted with 90% markdown documentation. No room left for actual engineering. The model drowning in untested specifications while trying to write code.
This wasn't spec-driven development. It was vibe specifying. Waterfall with a chatbot.
Who struggled, who succeeded
Two groups consistently had the worst experiences with AI coding tools.
First: people new to programming. They lacked the fundamental knowledge to steer the model, to catch when it went off the rails, to recognise the difference between code that looked right and code that was right.
Second: people who refused to understand how the technology actually works. They anthropomorphised the models, treated them like junior developers who just needed clearer instructions, got resentful when results didn't match expectations.
The people who succeeded shared common traits. They spent time with the tools without preconceptions. They developed intuition for what worked. They had domain expertise to validate outputs. They treated AI as a capability to understand, not magic to invoke.
The new models changed the equation
The models released in late 2025 are genuinely impressive. Opus 4.5, GPT 5.2: these are the AI models we were promised years ago. People have had time to work with them properly, and the difference is undeniable.
They're so capable that they prompt you for clarification. Give a simple prompt like "build this feature" and the model asks intelligent questions about edge cases, architecture preferences, integration points. The clarification loop that spec-driven development tried to front-load now happens naturally during implementation.
In this environment, heavy spec files provide less benefit than they used to. The model has enough knowledge, enough reasoning capability, that constraining it with rigid specifications can limit its path to success.
The workarounds built for yesterday's limitations may actually reduce the performance of newer models. That's always the risk with abstraction layers: they lock in assumptions that the underlying technology has already moved past.
Don't build agents, build skills
The major AI labs recently converged on a standard called Agent Skills. Anthropic's engineers have been talking about this publicly: the idea that one universal agent powered by a library of skills beats building multiple specialised agents.
I've been experimenting with this approach for months now, and the framing that captures it best for me is this: who do you want doing your taxes? A 300 IQ genius who's never seen a tax form, or someone with domain expertise who knows the current tax code? Current AI models are brilliant generalists. They lack domain expertise. You don't want them figuring out your specific workflows from first principles every time. You want consistent execution from something that already knows the domain.
That's what skills provide. It's worth understanding how they work, because they represent a different philosophy than spec-driven development.
A skill is simply a folder containing:
- A
SKILL.mdfile with instructions written in natural language - Optional helper scripts the agent can execute
- Supporting files (templates, examples, reference data)
The format uses markdown, not complex schemas. The content explains how to accomplish something, not just what endpoints exist. Think of it as an onboarding handbook for an AI, not a technical specification.
Key differences from specs:
Progressive loading. Unlike specs that get consumed whole, skills load context incrementally. The agent sees only metadata initially. Full instructions load only when needed. This keeps context windows efficient.
Executable context. Skills can include scripts the agent actually runs, not just documentation it reads. This blurs the line between "here's how to do it" and "here's code that does it."
Objective-based. You're not just defining "how to think about this", you're providing tools the model can use to achieve outcomes. The model can modify scripts on the fly if it encounters bugs.
Portability. A skill written for Claude Code also works with GitHub Copilot and OpenAI Codex. Write once, use across different AI tools.
Forces testing. The days of iterating with ChatGPT to create perfect-looking specs without testing them are over. A skill either works or it doesn't. You invoke it, watch the agent execute, and see whether it produces the outcome you specified. If something fails, you fix the instructions or the helper scripts and run it again. The feedback loop is immediate and concrete. You can't fake it. And skills build skills: Anthropic, OpenAI, and others provide skill-builder skills that guide agents through creating and validating new skills. You can build your own skill-builders too. The whole process is recursive.
The major labs are using skills in their own products. When you ask Claude or ChatGPT to do something complex like creating a Word document, they automatically load and execute skills to produce the artifact in a pre-defined, tested way. Anthropic published Agent Skills as an open standard for cross-platform portability in late 2025, and thousands of skills have already been deployed across domains from document processing to scientific research to enterprise workflows.
If you're already using MCP (Model Context Protocol), skills complement rather than replace it. MCP provides the connection to the outside world. Skills provide the expertise to use those connections effectively.
There's a useful analogy to the layers of computing. Models are like processors: massive investment to build, limited use on their own. Agent runtimes like Claude Code or Codex CLI are like operating systems: they orchestrate resources around the model. Skills are like applications: where people encode domain expertise. A few companies build processors and operating systems. But millions of people build applications. Skills open up that layer for everyone.
This framing isn't just Anthropic's view. Philipp Schmid at Google DeepMind has been writing about similar ideas: the model as CPU, the harness as operating system, the agent as application. When you see convergence across major labs, it's usually a signal worth paying attention to.
Agents as customers
Here's an inversion worth sitting with: if you work through agents, then agents are your primary customers.
Think about what that means for product design. The question isn't just "will a human find this useful?" It's "will an agent succeed with this?" Your documentation, your APIs, your file formats, your error messages: they're all being consumed by agents now. Design for agent success, and human success follows.
This reframes how you think about skills, tools, and products. A skill isn't just instructions for an AI. It's a product with a customer. That customer has needs, constraints, and failure modes. It benefits from good onboarding. It struggles with ambiguity. It performs better when the interface is clean.
Just like with human customers, your assumptions about what agents need will be wrong. They'll fail in ways you didn't anticipate. They'll succeed at things you thought would be hard. The practice of continuously updating your model of the customer applies here too.
This isn't a metaphor. It's increasingly literal. Agents are making purchasing decisions, booking services, negotiating on behalf of humans. If your product can't be successfully used by an agent, you're excluding a growing segment of how work gets done.
The shift from "building for humans who use AI" to "building for agents who serve humans" is subtle but significant. The first treats AI as a tool. The second treats AI as a customer. Both are true. But the second framing opens up different design questions.
Skills as transferable memory
There's something interesting happening with skills that goes beyond just packaging instructions.
Anything an agent writes down in a skill can be used efficiently by a future version of itself. When someone joins your team and starts using Claude for the first time, it already knows what your team cares about. The institutional knowledge compounds. Skills aren't just human-created instructions. They're a step toward agent memory that persists across sessions and transfers across people.
This also means skills are evolving toward software complexity. The early skills were simple folders with a markdown file. Now some take weeks or months to build. They have testing, versioning, dependencies. They're being maintained like software because they are software.
Skills might also become obsolete
Here's the uncomfortable next step in this logic: skills themselves are probably temporary scaffolding.
When enough tested, working skills exist in model training sets, the models will absorb that domain expertise. The emergent properties of LLMs mean that if you have enough high-quality skills for domains X, Y, and Z, models start generalising to domains A and B that weren't explicitly covered.
Eventually, models may become capable enough to dynamically write skills on the fly, or simply have enough intelligence in narrow domains that they produce good outcomes from simple prompts without needing packaged procedural knowledge. Philipp Schmid puts it well: "Capabilities that required complex, hand-coded pipelines in 2024 are now handled by a single context-window prompt in 2026."
The teams building seriously with this technology already know this. Vercel removed 80% of their tool logic and got faster results. Manus has refactored their agent architecture five times. LangChain has rewritten theirs three times. These aren't failures. They're what it looks like to stay close to a rapidly improving capability. Build light, build to delete. The harness is scaffolding, not permanent architecture.
This doesn't mean skills are a waste of time. Quite the opposite. Building skills now contributes to the training data that makes future models better. Your tested, working artifacts become the examples that teach the next generation. But don't get too attached. The point isn't that skills are the final answer. The point is that skills are the current best path forward, until they're not.
Who skills are for
Something interesting is happening with who builds skills.
In finance, recruiting, accounting, legal: people who aren't traditionally technical are building high-value skills. The barrier to entry has dropped. You don't need to understand OpenAPI schemas or write Python. You write markdown describing a process you already know, add some helper scripts, and test until it works. Domain experts can encode their expertise directly.
This looks like democratisation. And it is, for certain kinds of work. But it creates a split.
For non-technical domain experts, skills lower the barrier. Someone who deeply understands tax compliance or legal review can now package that knowledge for an agent without becoming a developer.
For developers, the ceiling goes up. Writing code is increasingly handled by AI. But reading, reviewing, understanding: that's what matters now. You can still produce bad skills. If you don't understand how the generated code works, you're a liability.
Eventually these systems deploy to air-gapped environments. No model inference available. Just you and the code. You need to understand it.
This isn't AI making programming easier across the board. It's AI lowering barriers for domain experts to encode their knowledge, while raising expectations for developers to understand what they're deploying. Both are true. Different groups feel different effects.
Building AI-native workflow as a capability
Here's what I think companies should actually focus on: developing the ability to work through agents as an organisational capability. Not buying tools. Not running pilots. Building the muscle.
This means everyone needs to look honestly at their skills matrix.
If you've been avoiding Python or TypeScript because you're "not a developer", that calculation has changed. You don't need to become a software engineer. But you do need to read code, understand what a skill is doing, review what an agent produces. The people who can't do this will increasingly depend on those who can. That's a vulnerable position.
If you're a developer who's been coasting on syntax knowledge, that calculation has also changed. Writing code is the part AI handles well. Understanding systems, recognising when generated code is subtly wrong, knowing which architectural patterns actually work: that's the part that matters now.
For non-technical people building skills, the learning comes through building. You'll discover what the models can't do. You'll find the edge cases where instructions fail. You'll develop intuition for what kinds of tasks work well and which ones don't. This experiential knowledge is valuable. But you only get it by actually building things, testing them, and encountering the failures.
The uncomfortable truth is that everyone needs to move. Developers need to move toward systems thinking and review. Non-developers need to move toward basic code literacy. Both groups need to develop intuition for what AI can and can't reliably do. The only way to build that intuition is to work through agents, not just read about them.
Companies that treat this as a tool procurement problem will keep falling behind. The ones that treat it as a capability development problem have a chance.
The challenge for organisations
The landscape moves faster than procurement cycles. By the time a tool gets evaluated, piloted, approved, and rolled out, the leading edge has often moved somewhere else.
This isn't anyone's fault. It's genuinely difficult to make good bets when the underlying technology improves every few months. The abstractions you invest in today may lock in assumptions that tomorrow's models have already moved past.
The organisations adapting fastest share some patterns:
- They stay close to the models themselves, avoiding thick abstraction layers
- They watch what the AI labs are investing in, and follow those signals
- They recognise that today's best practice might be tomorrow's technical debt
- They build skills for pivoting quickly rather than committing deeply to any single approach
The conversation about "AI agents" in enterprise contexts often focuses on tools and platforms. But the more important capability might be organisational: learning to recognise what's becoming obsolete before it's obvious, and having the flexibility to let go.
The shift underneath
We're moving into AI-native ways of working. Development happens through the agent. Not alongside it. Not with its assistance. Through it.
This requires a different set of skills. Steering, not typing. Reviewing, not writing. Critical thinking about outputs, not prompt engineering as an end in itself.
There's a difference between influencing direction based on what you've read and influencing direction based on what you've built. Both involve talking in meetings. Both involve shaping strategy. But the quality of the input is different.
Someone who's only read about AI brings abstractions: "we need an AI strategy", "agents are the future", "let's explore use cases". Someone who's also spent time building brings specifics: "this model hallucinates less on structured output", "that approach fails when the context exceeds 50k tokens", "skills work better than specs for X but not for Y".
The best technical leaders do both. They're in the meetings, influencing direction, shaping how the organisation thinks about this technology. And they're also building, experimenting, encountering the failures firsthand. The building is what makes the talking credible. It's what lets you know when the abstractions you're sharing match reality and when they don't.
This isn't about choosing between strategy and execution. It's about grounding strategy in direct experience. Even a few hours experimenting with skills, seeing what the models actually struggle with, changes the quality of every conversation you have afterward.
If your organisation's AI direction is being shaped primarily by people who read about the technology rather than people who use it, the gap between expectation and reality will keep widening. The fix is to make building part of the job, not something you delegate entirely.
Building your own skills, testing them, understanding how they work: that's how you develop the intuition to know what's actually possible. Not reading thought leadership. Not attending conferences. Building. Testing. Failing. Learning.
And being willing to let go of what worked last quarter if something better emerged this quarter.
That's the uncomfortable part. The tools that served you well aren't wrong. They're just no longer the best path forward. Recognising that transition point, early and often, is the actual skill to develop.
The cultural change problem
Dex Horthy from HumanLayer has been talking about this challenge: the hard part isn't the technology. It's adapting your team, your workflow, and your entire development process to work in a world where most of your code is shipped by AI.
He identifies a rift growing in teams: senior engineers don't adopt AI because it doesn't make them that much faster. Junior and mid-level engineers use it heavily because it fills skill gaps. But it also produces some slop. Then the senior engineers hate it more every week because they're cleaning up the mess that was shipped by Cursor the week before.
This isn't AI's fault. It's not the mid-level engineer's fault. It's an organisational alignment problem.
The fix requires cultural change, and cultural change has to come from the top if it's going to work. Technical leaders need to pick a tool, get some reps, understand the workflow changes firsthand, and then guide teams to adapt. Without that, you end up with fragmented adoption: some people abandoning AI, others over-relying on it, and senior engineers perpetually cleaning up.
If you can't figure out how to adapt your SDLC to this new reality, you're going to struggle. The technology is ready. The question is whether organisations can change fast enough to use it well.