Engineering Sustainable Speed

How a team keeps agent speed useful after the first fast build.

The process so far has been defensive in its relation to speed. Previous sections have instructed teams how to keep debt from creeping in by protecting team comprehension, planning before building, and reviewing continuously. This chapter is the offensive half: the work that makes the system and the agents better at the next task, so the speed from the first fast build compounds instead of eroding.

Watch it play out across two teams that adopt agents the same quarter. Both see the same early result: work that took a week now takes a day or two. Six months later, one team still moves at that pace and can still reason about what it has built. The other is slower than when it started, buried in code nobody fully understands, re-explaining the same corrections every session, waiting on a person every time the agent needs to check its own work. Both bought the same ability. Only one kept it.

What separated them is where they spent the time the early speed-ups saved. The first team put some of it back into the system that produces the speed: layered tests and review agents that catch what no person could at full volume, a harness that lets an agent finish a task and verify its own work, an architecture kept solid enough that every session builds in the same direction, and a habit of turning each recurring mistake into a check the system runs from then on. The second team spent all of it on new features and let the rest erode.

That reinvestment is Compounding Work, and it's the highest-leverage work on a project. It's the mirror image of technical debt. Choices you don't pay down accrue interest, and the balance grows; vibe coding is the credit-card version, fast now and more expensive every month you don't pay it off. Agents raise that interest rate, and they raise it on the earning side too: work that makes the system and the agents better at the next task pays back on every future delegation. Because an agent now does most of that work itself, it's cheap to make and the returns show up sooner.

Work like this used to be filed under developer experience (DevEx), where the cost could be hard to justify. Agents change that math, and open new opportunities that didn't exist before, which is why it gets a chapter of its own. Three investments make up that work, in the order this chapter takes them: the quality checks an agent runs to catch its own mistakes, the harness that gives it the context and tools to keep working on its own, and the design decisions only a person can make about the shape of the system. A final section names the signs that one of them has fallen behind.

Quality pyramid

Every layer of quality in this process is a feedback loop, a signal that the code isn't doing what it should yet. The pyramid orders those signals by cost. The base covers every line and runs for free on every change. Each layer up is narrower and more expensive, until the apex, where human judgment does what no check can. The aim is to push as much as possible down into the cheap, automatic layers, so attention goes only where a person is actually needed. The shape predates agents. What changed is how far up agents can reach. Agents now run the middle and upper layers that used to need a person, including system-level checks, code review, and even specialized design critique.

Checks the agent runs while building

The bottom layers are the feedback an agent reads while it works. The agent should practice test-driven development and add regression tests as it goes, so building the system and building the checks that measure it happen together. With a solid suite underneath, the agent can run much longer on its own, iterating until the task is done without breaking something else along the way.

Checks that used to be too expensive

Agile in the age of agents changes the cost-benefit of system-level checks. A check that was too expensive to build for a human reviewer is cheap when an agent runs it hundreds of times. Agents lack the holistic judgment a person brings to review, so the team closes that gap with new automated checks. A visual regression test flags unintended UI changes against a stored baseline. A schema check proves an API contract holds end to end. A migration test runs the rollback. Each one used to be a judgment call someone made by eye, or skipped. Built as a check, it runs on every change and never gets tired.

Review that used to need a person

Passing checks are only part of the story. Code review is still a bottleneck, and here, too, agents help. Code review agents catch bugs and, with tuning, enforce architecture rules and style conventions. They give the code a first pass that improves it before a human reviewer sees it, and they often find problems a person would miss.

Senior judgment can't be fully automated. But on any project a senior engineer checks the same concerns again and again, including non-functional requirements like performance and security, and the design decisions that have to fit the rest of the system. A specialized review agent runs a team-crafted prompt for one of those concerns, on demand or in CI. Any pattern an engineer keeps having to check can become an agent that checks it first. What's left at the apex is the work no agent can stand in for, the lateral thinking, taste, and judgment that come from context an agent doesn't have.

Harness engineering

A coding model is general-purpose, and every conversation it has starts from a blank slate, with only the prompt and the files it reads along the way to go on. Give the model a task and it does something reasonable, but without the team's context it sometimes does the wrong reasonable thing, and someone catches it later. Harness engineering shapes that environment, so the agent comes in with more than it can infer on its own. It works in two ways.

Guidance: what the agent knows about the system and how the team works.
Tools: what the agent can do, beyond the capabilities it starts with.

Guidance

When people wrote all the code, teams ran on tacit, tribal knowledge, the kind picked up in conversation and over years on a project. An agent doesn't pick any of that up. The knowledge the agent needs has to be written down and put in front of it while it's working.

The Knowledge Base holds design and product intent.
Code comments carry the rationale behind a specific implementation, so an agent working there understands why, not only what.
Agent Skills package step-by-step instructions for specific tasks.
AGENTS.md gives every coding agent the same top-level guidance for working in the system.

This is where most of the improvement happens. Each time an agent gets something wrong, needs a concept re-explained, or repeats a manual task, the team writes the answer into one of these places, so the next agent starts with it. Sometimes that's a procedure or a rule. Sometimes it's making something the team already wrote easier to find at the right moment.

This pays off well beyond code. Because Definition is usually the bottleneck on a project, a harness that speeds up definition work can be worth more than another gain in implementation speed.

Tools

An agent isn't limited to the capabilities it ships with. The team can build it new ones, tools that run a precise, repeatable task or take an action the agent otherwise couldn't. The quality pyramid was one case of this, since a test is a tool the agent runs to check its own work. The same idea reaches much further.

Increasing autonomy: An agent gets stuck when a task needs an action it can't take. Out of the box it usually can't see the UI it just built, so a person has to notice the visual bug and point it out. Give the agent the Figma file, a visual component harness to render the component in different states, and browser automation, and the agent can check its own work against the design and fix it before anyone looks.

Increasing reliability: Agents follow instructions better every month, but they're fuzzy by nature. When a task has exactly one right way to run, code beats an agent at it, producing the same result every time, faster and cheaper. Wrap a task like deployment or schema validation in a tool, tell the agent when to reach for it, and that step stops being a place where things go wrong.

Increasing versatility: Coding agents are built for code, but a project needs more than code. Take a production incident. Normally a person pulls the logs, gathers the telemetry, and hands it all to the agent. Give the agent its own way to search logs and watch service health, and it can work the incident itself instead of waiting to be fed.

Running example · Compounding Work: Email Verification Harness

The friction. This one surfaced at Review of the Invite Members effort. Building and reviewing Invite Members meant confirming, over and over, that the emails actually went out. The staging "Sent emails" view the team added in Planning is fine for a person, but an agent debugging an email problem has to click through that screen or dig through the staging server to check a single message. It's slow, and it pulls the agent out of the loop every time the agent needs to verify its own work.

The investment. So the team creates Compounding Work: Email Verification Harness, a skill and a tool that read the staging mailbox directly. Now the agent can ask its own questions without opening a screen. Did the invite to alex@ go out? What link did it carry? Did a reminder fire? The agent can debug an email problem end to end instead of stopping to ask a person to look.

Why it compounds. The skill outlives Invite Members. Every later email feature, from off-boarding notices to billing receipts, inherits the same way to check itself, so verifying email stops being a step that needs a person and becomes something a session finishes on its own. One friction, turned once into a capability every future delegation can use.

Orchestrating parallel work

The harness pays off one more way. Everything up to now improves a single delegation: one agent on one task, with a person reviewing the result. A rich enough harness raises the ceiling on that. A single agent tops out at what it can hold in one context and how far it can build in one continuous run without losing the plot. When the harness gives agents the context and tools to work on their own, and the checks let them confirm their own results, a developer can point a team of agents at a build no single agent could accomplish, like a refactor that cuts across the system or a feature that spans the whole stack. Each agent owns a piece of the same plan and runs the same checks, the pieces recombine into one change, and the developer reviews the assembled result. The developer's job shifts up a level of abstraction, from prompting agents to building workflows that organize and prompt teams of agents.

When the conditions are right, this is higher-leverage than any single speed-up in this chapter, because it multiplies the work in flight rather than making one stream of it faster. Those conditions are demanding. The Compounding Work this chapter has already described is a necessary baseline. Orchestration works best when the goal is engineering-heavy and the checks are objective: migrate an API, run a broad refactor, update a component family. It is a poor fit when the work depends on unsettled product rules or domain calls. Where those conditions hold, a week of sequential delegation can collapse into a day. Where they don't, pushing orchestration past its limits just multiplies work that never fits together, and keeping a person closer to the work is faster.

For a given build, a developer finds the edge of what orchestration can handle by working up to it. They run a small piece of the build first, watch where the agents go wrong, and strengthen the harness against what broke, whether that's a missing check or an instruction the agents kept ignoring. Once a run comes back clean, they widen the scope. Some early passes exist mainly to be thrown away: build a version, study where it falls short, and rebuild with the plan and the harness tightened. Each pass either widens what the agents can take on or shows the harness still isn't ready.

So orchestration sits at the top of the same hill the rest of the chapter climbs: the better the checks, the harness, and the design underneath, the more of a build a developer can run at once. None of that relaxes their ownership of the result. A larger build still has to be understood and validated by the person who directed it, the same obligation as a single delegation, now spread across more code. That makes comprehension the real ceiling on how far this scales: a developer can launch many agents but can only merge a build they can still stand behind and explain.

Knowing when to slow down

The other levers in this chapter are things the team builds. This one is work the team has to keep choosing to do. Good design is a bet a person makes against their own future effort, such as a simpler structure, a data model where the invalid states can't arise, or abstractions that compose so the next problem is easier than the last. Each one costs effort now to save more later. People make these bets because they're the ones who would otherwise pay, and the incentive is their own finite time and attention. An agent has none of that. Work costs the agent nothing, so it has no reason to invest ahead. It solves the problem in front of it and moves on. Point it at enough problems and you get a system that works and is costly to change.

This is the oldest form of Compounding Work, and the one an agent won't produce on its own. It's the design quality of the system itself, which every later change either rides on or fights. Protecting that quality is part of protecting speed, because a system that's expensive to change is slow no matter how fast the agent works. That makes it the team's call, and leadership's in particular, to notice when throughput mode is making the system worse and to spend time on the design instead. The signs are familiar. The same workarounds pile up, every change touches more than it should, and nobody can hold the current shape of the system in their head. The clearest case is new ground with no pattern to follow yet, like a new project or a refactor that changes how the system is layered.

Spending time on the design means stepping out of scaled delegation and putting human capacity back on the hard parts. The team gathers context, weighs structural options, settles the constraints, and writes down enough direction for agents to follow. Sometimes that's one developer spending part of an iteration on architecture instead of features. Sometimes it's most of the team. It's the same work Definition does, turned inward at the system. Architectural uncertainty is definition work.

What that work looks like

Seed the structure deliberately. Before delegating at scale, the lead developer or the team makes explicit choices about layering, module boundaries, interface design, and the key abstractions. Write the foundational code by hand to force those decisions. A few well-considered examples set the pattern agents follow for the rest of the project.

Mob program around the key decisions. Getting the whole team writing and reading the same code together is one of the fastest ways to build shared architectural intuition.

Spike, iterate, then commit. An agent can spike a structural approach quickly. Treat that spike as a draft to argue with. Reshape it, and make sure the team understands and agrees with the structure before building on it.

Once the structure is sound and the team can answer "does this match how we've agreed to build this system?", scaled delegation takes over again and the planning checks keep it on track.

Addressing development failure modes

Development fails in recognizable patterns once agent output gets ahead of quality signals, planning, architecture, or team comprehension. The signals below are ordered by priority, because earlier signals affect the later ones. Work the list top-down. If quality checks are weak (signal 1), don't spend time on signal 7 until that's resolved. Earlier signals are more fundamental, and fixing them often resolves the later ones.

Quality gates are weak Fundamental

Everything in this process assumes reliable feedback. Types pass, tests are green, review agents flag real issues, and validation catches regressions. When any of the quality layers is flaky, missing, or slow, agent delegation volume turns into risk the team cannot see. Nothing else in this list matters until the feedback is reliable.

What you're seeing

CI is flaky or too slow to use. Tests have obvious gaps, where the spec describes a behavior but no check covers it. Review agents run inconsistently, or their output is noisy enough to ignore. Bugs reach production that an earlier quality layer should have caught. The team starts routing around the checks instead of fixing them.

What to do

Treat improving the feedback infrastructure as a higher priority than completing new feature work. For each missed issue, ask which layer should have caught it and strengthen that layer. The rest of the process leans on TDD during implementation, regression tests on bug fixes, and reliable review agents. Fix one quality layer at a time until the team trusts them all.

Planning is rushed or skipped Fundamental

Analysis during planning is the main opportunity for human judgment before implementation begins. When planning gets compressed, with missing efforts, missing implementation plans, or no team review, the agent works within weak constraints. Code still arrives quickly, but review becomes the place where the team rediscovers intent, untangles avoidable choices, and makes decisions that should have happened before delegation.

What you're seeing

Developers start tasks without efforts or implementation plans. Specs enter planning with unresolved questions that get pushed into execution, and you might hear "we'll figure it out when we get there." The team rushes to start building. Mid-iteration, work stalls waiting for decisions that should have been made up front.

What to do

Reduce iteration scope before reducing planning. If planning is skipped to fit more work in, the team is optimizing for output volume over its own process. Reinstate Analyze & plan as a non-negotiable step. If planning capacity is genuinely insufficient, cut scope or slow the iteration while preserving the analysis.

Architectural direction is weak Foundation

Agents replicate patterns. Without shared architectural direction, every developer and every session makes reasonable but inconsistent choices about layering, module boundaries, interface shape, and where the architecture is heading. The codebase works but it fights itself, and the problem grows the more the team delegates.

What you're seeing

Similar problems get solved differently across efforts. Reviewers disagree about what "fits the system." Agents reproduce whichever pattern they encountered last. Duplicate abstractions accumulate. Tests pass but structure drifts. Refactor cost rises faster than feature velocity falls.

What to do

Establish architectural direction before the team delegates to agents heavily. The practices for that are in Knowing when to slow down. Capture that direction where agents will follow it, in code, in documentation, and in examples. Until the alignment is real, keep delegation narrower and increase human review around structural decisions.

Same agent mistakes keep recurring System improvement

Agents don't remember previous sessions. If the team keeps re-prompting the same corrections ("don't do X," "the convention here is Y," "remember that Z is required"), that knowledge exists only in people's heads. Every future session starts without that context until someone teaches the system what the team already knows.

What you're seeing

Review agents flag the same class of issue over multiple iterations. Developers paste the same corrections into conversations with agents. The codebase accumulates workarounds tied to the same few patterns the agent keeps getting wrong. The team starts saying "the agent keeps missing this."

What to do

Treat every recurring mistake as a signal to improve the system. This is the harness loop from Harness engineering, applied as a fix. Add the missing context to agent instructions, seed the correct pattern with an example, or build a review-agent check for the issue class. Agents help build these improvements, so the investment is small. If the same correction has happened twice, bake it into the system before it happens a third time.

Developers can't explain what was built Review

The Explain step exists to prevent this. When it's skipped or too light, the demo becomes an agent walkthrough instead of an owner walkthrough. The developer misses why something was implemented a certain way. Review only catches what the team understands.

What you're seeing

Demos are hand-wavy. Stakeholder questions get "I'm not sure, that's what the agent did" as an answer. Design choices look arbitrary, and the developer can describe what the code does but misses why. When probing uncovers something unexpected, the developer struggles to tell whether it was intentional or a drift.

What to do

Rebuild the Explain step. Before a demo, have the agent walk through what changed, the design choices, the deviations from the plan, and the unexpected side effects, so the developer understands what was built. Pair on Explain if needed, and don't merge until the developer can answer "why this?" for the key decisions.

Specs aren't staying ahead Upstream interface

This signal shows up in implementation, but the failure is somewhere upstream. Planning runs into specs with gaps. Developers ask questions mid-iteration that specs should answer. Implementation stalls waiting on direction.

What you're seeing

Specs arrive at planning with visible gaps, including missing acceptance criteria, TBD sections, and unresolved questions in the spec body itself. The pattern repeats iteration over iteration. Developers ask questions specs should answer and stall waiting on definition-side people for responses. The bottleneck is spec readiness.

What to do

Raise it with the people doing definition work. Phase 1: Definition explains the upstream pipeline; see What happens in this phase. Hold half-formed specs back instead of pushing them into implementation to fill the iteration. If definition is underfed, reduce appetite until the pipeline catches up.

Volume outpaces comprehension Systemic

Agents make it possible to ship more than any team can reasonably hold in its head. When volume outpaces comprehension, nobody knows the current state of the system, regressions span iterations, and the team loses the ability to reason about what it built. This is usually a downstream symptom of earlier problems.

What you're seeing

New features pile up and nobody remembers why they were built. Regressions emerge across iterations, traced to interactions nobody understood. Review skims because there is too much to look at properly. Onboarding someone new gets harder every iteration. The codebase grows faster than the team's mental model of it.

What to do

Slow the cadence, and run some iterations that need less feature work. Reduce appetite and invest in Compounding Work, such as patterns, examples, documentation, and review agents, that makes the system easier to reason about. This signal rarely has a single cause, so check the earlier signals first. If they all check out, the team has more capacity than the domain absorbs, and should invest in future capacity rather than output.