# Agentic Engineering for Teams

> How software teams pair collaboration with AI for sustainable speed.

This file concatenates the generated markdown pages in reading order.

---

Source: /agentic-engineering/index.md

# Agentic Engineering for Teams

How software teams pair collaboration with AI for sustainable speed.

by Drew Colthorp

A working guide to how Atomic Object's teams build software with AI agents — the twice-weekly cadence, Definition and Implementation running in parallel, and the human checkpoints that keep speed sustainable.

![Diagram showing Product Backlog moving through Definition, Planning, Review, and Implementation to Shipped Software.](./assets/images/external/agentic-dev-final.svg)

## Contents

### Front matter

- [Foreword](./foreword.md)
- [Acknowledgements](./acknowledgements.md)

### Start here

- [Introduction](./introduction.md)
  - [The process must evolve, not be reinvented](./introduction.md#the-process-must-evolve-not-be-reinvented)
  - [What that development process has to do](./introduction.md#what-that-development-process-has-to-do)

### The Process

- [Overview](./overview.md)
  - [The team's weekly rhythm](./overview.md#the-teams-weekly-rhythm)
  - [A feature's journey](./overview.md#a-features-journey)
  - [A worked example](./overview.md#a-worked-example)
  - [Who does the work](./overview.md#who-does-the-work)
  - [Partnering with agents](./overview.md#partnering-with-agents)
- [Phase 1: Definition](./definition.md)
  - [What happens in this phase](./definition.md#what-happens-in-this-phase)
  - [Who picks up the work](./definition.md#who-picks-up-the-work)
  - [Example: Defining the Seats & Membership Backlog Item](./definition.md#example-feature-definition)
- [Phase 2: Planning](./planning.md)
  - [What happens in this phase](./planning.md#what-happens-in-this-phase)
  - [Running the ritual](./planning.md#planning)
  - [Example: Planning the Invite Members and Email Delivery efforts](./planning.md#example)
- [Phase 3: Implementation](./implementation.md)
  - [What happens in this phase](./implementation.md#complex-task-workflow)
  - [Example: Building the Invite Members effort](./implementation.md#example)
- [Phase 4: Review](./review.md)
  - [Running the ritual](./review.md#running-the-ritual)
  - [Example: Reviewing the Invite Members effort](./review.md#example)

### Finding Success

- [Engineering Sustainable Speed](./engineering-sustainable-speed.md)
  - [Quality pyramid](./engineering-sustainable-speed.md#quality-pyramid)
  - [Harness engineering](./engineering-sustainable-speed.md#harness-engineering)
  - [Knowing when to slow down](./engineering-sustainable-speed.md#knowing-when-to-slow-down)
  - [Addressing development failure modes](./engineering-sustainable-speed.md#addressing-development-failure-modes)
- [Managing Project Outcomes](./managing-project-outcomes.md)
  - [Making the right trade-offs](./managing-project-outcomes.md#making-the-right-trade-offs)
  - [Managing scope](./managing-project-outcomes.md#managing-scope)
  - [Example: Managing scope](./managing-project-outcomes.md#example)
  - [Addressing project failure modes](./managing-project-outcomes.md#knowing-when-its-breaking)

---

Source: /agentic-engineering/foreword.md

# Foreword

---

Dear reader,

Most of the conversation about AI in software development focuses on individual coders. A solo developer with the right setup is dramatically more productive than they were two years ago. But serious software gets built by teams, and almost nobody is talking about how teams should work now.

When writing code was the slow and expensive part of software development, stories could be written at a low level of fidelity, and reviews could wait. Most of the detailed requirements were finalized by the person writing the code. Agents have transformed writing code from the slow part to the fast part. Now a lack of upfront definition turns into a week’s worth of bad code, implemented in one afternoon.

As leaders of a custom software consultancy whose teams have shipped more than 300 products over the past 25 years, we see the need for team working patterns to change. Everyone is chasing speed right now. The counterintuitive truth is that for teams, the way to sustain speed is to invest more time in collaboration. If you structure the work so people stay in sync, long-term speed is the byproduct.

Agentic Engineering for Teams is how we structure team collaboration. It preserves what agile got right: short feedback loops, working software as the measure of progress, and teams that adapt to what they learn. It adds the requirements definition rigor that agent-assisted development demands. We don't name specific tools, because tools change. We describe the collaboration structure that makes teams effective when using agentic development tools.

We're publishing this instead of holding it close for two reasons:

1. Most software teams are about to hit the same AI productivity wall we did, and we want to help them move beyond it.
2. This will get sharper when people we've never met read it and suggest enhancements. [Email us](mailto:contact@atomicobject.com) with your feedback.

If you lead a software team, remember this: the AI productivity everyone is chasing is best measured by how well your team works together.

—

Mike and Shawn

Atomic Object Co-CEOs


Next, the [Acknowledgements](./acknowledgements.md) recognize the people who shaped the Process.

---

Source: /agentic-engineering/acknowledgements.md

# Acknowledgements

---

This process and book are largely Drew Colthorp's work, but we owe its current form to many folks. Particular thanks to Alecia Frederick, Bryan Elkus, Elaine Ezekiel, Jake Silas, John Fisher, Kealy Williams, Meg Kretz, Michael Marsiglia, Nick Hazekamp, Nick Keuning, Rob Bell, and Sarah Brockett, for direct contributions to the content, and to the many other Atoms who tried pieces of the process on real work and told us what they thought.


Next, the [Introduction](./introduction.md) starts the main body of Agentic Engineering for Teams.

---

Source: /agentic-engineering/introduction.md

# Introduction

A software-development process built for agents and teams

---

Here are four realities of modern software development:

1. Agents can write code well.
2. Real software is still built by teams.
3. Teams that don't use agents are getting out-built by teams that do.
4. Letting each developer pair with agents however they like cannot scale.

Any useful development process has to be designed with these realities in mind. The rest of this introduction takes each one in turn. What follows is Atomic's project-tested software-development process, built around them.

### Agents can write code well.

Agents complete much of the coding now, and the teams that use AI tools well are shipping software products faster than the teams that aren't. That's a change in tools, but there’s a more meaningful change software teams need to grapple with: the development process has new bottlenecks.

When writing code was the expensive part, it was also the constraint. Healthy teams organized around making that part go faster. Once agents take over the bulk of the coding, the bottleneck shifts into deciding what to build and into verifying what the agents produce.

The first of those constraints is less familiar to seasoned developers because it used to be handled automatically. Every project carries ambiguity, and it gets resolved one of two ways: deliberately and upstream, by someone who writes the answer down, or ad hoc and downstream, by whoever hits the gap mid-build. Pre-agentic delivery got the second path almost for free. A developer hit an unclear point, asked across the room, got an answer, and kept going. That the answer lived in one person's head was fine, because the same person did the building.

That second path stops working when agents do the coding. An agent told to build will build: the gap gets settled inside one developer's session, as a call the agent makes on its own or a quick answer the developer gives it, recorded nowhere, while the rest of the team and its agents build in parallel. Resolved that way, the answer never leaves the session. Written down upstream, it reaches everyone whose work depends on it.

|  | Pre-agentic | Agentic |
| --- | --- | --- |
| Building | Expensive, so teams invest heavily before committing | Cheaper, so teams validate smaller decisions in working software |
| Knowledge | Critical context lives in people's heads | Context is written down where agents can reach it |
| Instructions | A rough description works; developers fill the gaps | Instructions must be precise; agents do exactly what they're told |
| Slow decisions | Hidden behind long build times | Surface right away as idle time |

### Real software is still built by teams.

The tools changed, but the importance of teamwork didn’t. Real software is built well when a team shares an understanding of the architecture, the product direction, the customer context, and the codebase. The agents are best suited to handle appropriately scoped execution on a spec. Agents can plant a tree, but teams still tend to the forest.

That's why a team retains the accountability for the product as a whole. Teams hold the taste to decide which guess is right, and they coordinate the parts of a product that have to fit together as the work spreads across people and agents working in parallel. Agents simply speed up the pace of a team doing that work effectively.

### Teams that don't use agents are getting out-built by teams that do.

A team that won't delegate to agents is going to fall behind. The speed-up agents create is real, and once a competitor starts benefiting from that speed, the advantage for the agentic team shows up in how much more product reaches their users and how much faster it adjusts.

For teams still resisting agents, breaking through that resistance must be a top priority now.

### Letting each developer pair with agents however they like cannot scale.

Most teams' first foray into agentic development is to let each developer pair with agents however they like. Inside one task, this works well. Across a codebase, it falls apart.

Each developer's agents pull from their own context and make their own guesses, with no shared view to reconcile them. As that work converges, the seams between independently built pieces multiply, and the architecture drifts.

Code review can't keep up with the volume. Past a point, neither the team nor its agents can work in the codebase confidently, and the early speed-up sputters to a halt.

That pattern is vibe coding at team scale: every agent deciding for itself. It's not the way.

---

## The process must evolve, not be reinvented

The agile fundamentals still hold, and several of them matter more than they did before.

Agents amplify good design and engineering. Strong product foundations let them reach further; weak ones are cheaper to fix than they used to be, but they still cap how far agents go.

Automated testing used to be a quality investment. With agents in the loop, it's the thing that lets them work safely at all. Without it, regressions compound faster than reviewers can catch them.

Developer tooling matters more too. Agents work better when the project has clear setup instructions, fast tests, reliable local environments, and repeatable commands for common tasks. Those investments used to save developer time; now they also give agents a safer, faster path through the work.

Architecture matters more for the same reason. Clear boundaries, fast checks, and well-factored modules let agents work on smaller pieces without breaking the whole system. A tightly coupled system pushes more decisions back onto people, because every change requires someone to understand and coordinate more of the whole system.

The disciplined version of all this—leaning hard on agents to ship quality software without losing the craft—is what we're calling agentic engineering.

---

## What that development process has to do

A useful process has to do two things at once: absorb the new constraints that agents bring and protect the team coordination that makes software work in the first place. In practice, three things shift to make that possible.

Definition runs in parallel with Implementation to keep the next iteration fed with clear specs. Direction has to be written down, kept current, and ready before each build cycle starts, because the developer who used to carry it in their head isn't the one doing the building anymore.

The development iteration cadence gets shorter. A team leaning on agents produces more in a week than it can keep coherent over a multi-week sprint, so the iteration has to shrink to keep up. We recommend two iterations a week. Planning and Review tasks become the human checkpoints that hold the line. Shorter cycles sound like more intensity, and in the delegation itself, they are more intense. The rest of the time saved goes to protecting and expanding the human space around that work. A good process has to do both at once; teams must **slow down some activities to speed up the whole process**.

Direction has to be more concrete. A developer would absorb leftover ambiguity at the keyboard, so the questions that used to get resolved mid-build now have to be settled before handoff to an agent.

Traditional Agentic 4-week project 4-week project RDP SPRINT 1 Stories & planning Development SPRINT 2 Development Ship RDP Definition Implementation 2× PER WEEK Ship 4 weeks 4 weeks Same project, same timeline. Definition grows from a thin planning sliver to a full parallel track.

---

## Investing in sustainable speed

All of these process changes cost time and attention. They’re worth it because this work is what keeps a team fast across the life of the project rather than only its opening weeks.

The speed-up itself has to be earned, and its size varies by project. Codebase health, domain ambiguity, the speed of product decisions, and the maturity of the team's agent harness (the instructions and tools its agents work inside) together set a ceiling on how much work the team can safely hand to agents. That ceiling sits in a different place on every project. Finding the ceiling and raising it is part of the team's job. The same checks, specs, tooling, and review agents that keep delivery durable are what lift it over time.

A pre-agentic team delivers at a steady pace, bounded by how fast people can write code. An agentic team with no process delivers in an early burst and gets bogged down: drifting architecture, review that can't keep up, a codebase neither the team nor its agents can move through confidently. An agentic team with this process delivers faster than the pre-agentic team and keeps delivering because the definition, the cadence, and the handoffs are what hold the speed-up instead of letting it collapse.

So the extra work is what converts a short-lived burst into a durable pace, and the agentic coding speed-up, once earned, pays for that time investment with room to spare.

### For the business

- Tighter feedback loops.
- A richer product.
- The reach to take on work that wasn't viable before.

### For the team

- Judgment time as cycles shorten.
- Room for pairing and alignment.
- Time for the checks, specs, tooling, review agents, and DX work that wasn't justifiable before.


Next, [Overview](./overview.md) maps the cadence, where the work lives, who does it, and what stays a human call.

---

Source: /agentic-engineering/overview.md

# Overview

How teams define, build, review, and manage software when agents make code generation much faster.

---

![Agentic development cycle diagram: Product Backlog feeds Definition and Implementation, Planning and Review connect the work, and completed work moves toward shipped software.](./assets/images/external/agentic-dev-final.svg) The diagram shows Product Backlog feeding a Backlog Item into Definition. Definition includes information gathering, synthesis, spec authoring, and evaluation and compounding. Definition overlaps with Implementation, where the team shapes and critiques, builds and hardens, explains and validates, then reconciles and delivers. Planning and Review sit between the two tracks, completed work leaves the cycle, and each cycle ships working software.

During each iteration, the team balances multiple pieces of work, each at a different stage. That's why the Definition and Implementation Phases run in parallel. At any moment, the team is defining some features while implementing others. Definition tries to always stay one step ahead, readying what will be built in the next Implementation Phase. Twice a week, the whole team comes together to start and close the work: that shared loop is the iteration, opened by Planning and closed by Review.

One note on Phase order in the graphic below: The team's work on a given iteration opens with Planning. The chapters that follow are numbered differently: Definition first, then Planning, Implementation, and Review. This is because the chapter order traces how a single feature moves from raw intent to shipped software, not how a calendar week runs for the team. The two orders describe the same cycle from different angles; the feature's journey later in this Overview shows how they line up.

Shared team phase

#### Planning

The whole team selects ready work, scopes the iteration, assigns capacity, and names the risks that need human attention before agents start producing code.

Parallel work phase

#### Definition

A portion of the team turns raw content like stakeholder input, research, design exploration, domain rules, technical spikes, and review findings into agent-ready directions the team can build against during future iterations.

Parallel work phase

#### Implementation

A portion of the team turns a previous iteration's planned work into working software. Developers direct agents, shape complex work, preserve context, run checks, and decide what is safe to merge.

Shared team phase

#### Review

Review brings the team back to the product and code that was built during Implementation. The team captures and explains choices, probes risks, and tests the integrated product. They turn findings into fixes and follow-up work.

Of the four phases, Definition is the one that's new in the agentic era: agents can't read minds, so deciding what to build has to move out of people's heads and onto the page, where an agent can act on it.

That's the whole process. The rest of this chapter describes the same process from a few angles: the rhythm the team runs on, how a feature moves through the process, who does the work, how agents fit in, and what the team never hands off.

---

## The team's weekly rhythm

The iteration cadence is deliberately short. Two iterations a week keeps the development loop tight enough that implementation doesn't outpace comprehension, and imprecise direction surfaces in days, not weeks.

Each cell below shows what the _team_ is doing that half-day, across all the features in flight, not the path of any single feature. When a slot shows Definition + Implementation, the team is splitting up to define some features while implementing others.

MONTUEWEDTHUFRI AM PM Planning Definition +Implementation Definition +Implementation Definition +Implementation Review Planning Definition +Implementation Definition +Implementation Definition +Implementation Review ITERATION 1 ITERATION 2

**Weekly iteration 1:** Monday morning Planning through Wednesday morning Review. **Weekly iteration 2:** Wednesday afternoon Planning through Friday afternoon Review.

The twice-weekly iteration is for team comprehension. The team plans the next handoff, does the work, and reviews what changed while the context is fresh.

Four ceremonies a week can sound heavy. Most of those half-day blocks are working sessions: reviewing specs, choosing the next handoff, preparing a demo, or probing an edge case. The whole team is together only for the parts that need shared judgment. That time replaces alignment that used to happen while developers wrote code side by side. Once agents take on more implementation, less shared understanding is produced in the act of building. Planning and Review make that understanding explicit before the next handoff starts.

---

## A feature's journey

The iteration needs an initial product shape: enough understanding of users, constraints, and first capabilities for the team and product owner (the person or group accountable for product direction) to sequence the work. At Atomic Object, that early shape comes from a [Research, Design, and Planning (RDP)](https://spin.atomicobject.com/rdp-phase-benefits-deliverables/) engagement, where teams run discovery and framing work. That starting shape becomes the Product Backlog, the high-level list of capabilities worth making buildable next. The Product Backlog captures where the product is headed. The Work Board, Knowledge Base, and specs carry the day-to-day handoffs. The backlog keeps changing after kickoff: discovery keeps running inside the Definition Phase, and new Backlog Items join the list as the product direction develops.

Three other work surfaces keep those Product Backlog Items moving through the iteration. The Work Board, the closest analog to a traditional agile backlog, holds near-term tasks across the Definition and Implementation Phases, each small enough to finish in an iteration (e.g., a stakeholder session, a transcript to synthesize, a spec revision, a small code change). The Knowledge Base holds the decisions, domain rules, examples, and constraints Definition and Implementation feed and draw on, kept in the repo where agents can reach it directly. The Specification Repository holds the product, design, and technical specs that describe what to build in enough detail to plan an Implementation and have an agent execute. Specs stay current: when Implementation surfaces constraints or contradictions in the spec, the team updates it to match the actual system behavior.

### A worked example

Those work surfaces are easier to see in motion. Here, we will introduce an example which we’ll continue referencing across this resource. Anvil is a B2B collaborative workspace product from a software company we'll call Acme. Anvil's users sign up as an organization, invite their teammates into shared docs and projects, and pay for each member who can edit. Today, every member of an organization can edit everything; Anvil has no read-only access yet.

The Backlog Item Seats & Membership sits on Anvil's Product Backlog: the ability for an organization's admins to control who belongs, who holds a paid editor seat, and how many seats the organization buys. It is a substantial capability that cannot be implemented in a single iteration of work. Before the team can invite view-only members, enforce a seat limit, or cleanly off-board someone, Anvil needs read-only access first: the split between an editor, who holds a paid seat and can change content, and a viewer, who can read without consuming a paid seat.

The diagram below is a mid-iteration snapshot of Seats & Membership. The item has fanned out into Knowledge Base decisions, specs, and tasks on the Work Board. Some of these are ready to build against, some are being built, and some are still being written. The same Backlog Item exists at several resolutions at once.

Active iteration example

![Example of Backlog Item: Seats & Membership moving through an active iteration: Product Backlog, Work Board, Knowledge Base, and Specification Repository connected by iteration tasks and updates.](./assets/images/external/work-surfaces-active-cycle.svg)

1. **It starts as a Backlog Item.** Seats & Membership is one coarse capability on the Product Backlog, the kind of thing the product owner and team sequence against onboarding, billing, reporting, and admin work. That level of detail is right for prioritizing and wrong for building. An agent handed Seats & Membership would invent a seat model, a role scheme, and an invite flow nobody carefully defined yet, so the capability needs a smaller, explicit target first.
2. **Definition makes it buildable.** In [Phase 1: Definition](./definition.md), the team gathers signal at the level the capability needs, reconciles durable decisions into the Knowledge Base, and turns the next buildable target into the feature spec Invite Members, something agents can execute against without inventing product or architecture decisions. A stakeholder session might clarify that viewer invites are free while editor invites reserve a paid seat. A designer may prototype the invite dialog, Pending row, empty state, and seat-limit behavior. The tech lead may define the email-delivery interface and flag read-only access as a foundation. Those inputs become several specs, starting with Invite Members and the technical spec Email Delivery that Planning can take up next.
3. **Planning commits it to an iteration.** [Phase 2: Planning](./planning.md) takes Invite Members and Email Delivery side by side. The team assigns Invite Members as one developer's priority path and Email Delivery to another, and names the risks that need human attention, like two admins inviting into the last available seat or an email that never sends, before an agent writes any code.
4. **Implementation builds it.** In [Phase 3: Implementation](./implementation.md), a developer directs agents through the invite flow: shaping the schema, reserving a seat when an editor invite is sent, wiring a pretend email sender that stands in for real delivery, allowing the team to test email features without actually sending them. Implementation hands off the code changes, tests and runtime checks behind it, the known risks, and a broader explanation of the changes to Review.
5. **Review checks what landed.** [Phase 4: Review](./review.md) brings the team back together at the end of the iteration to demo inviting someone as an editor consuming a seat and inviting someone as a viewer staying free. The team probes the edge cases: what if the email never arrives, two admins invite into the last seat at once, or an invite is revoked after someone clicks it? Some findings are quick fixes; others become spec revisions, Knowledge Base updates, or work routed back through Definition and Planning for a later iteration.
6. **Each pass feeds the next.** The Review Phase activities feed the surfaces that guide the following iteration, so the next pass at Seats & Membership, such as the spec Remove Members, starts with more context than the invite flow had. When Review exposes a recurring drag, such as manually confirming invite emails in the staging environment, the team turns that friction into Compounding Work: an investment that makes later agent delegation faster and safer, in this case Email Verification Harness, a harness skill and tool that lets agents read the staging mailbox directly.

---

## Who does the work

Roles still anchor the work. The product manager (at Atomic Object, the delivery lead) owns stakeholder flow and business stewardship, the designer leads research and design work, the tech lead drives the technical specs and architectural direction, and developers carry Implementation. On a team of six, a steady state might be the product manager, the designer, and one developer working in Definition while three developers build.

The boundary around those assignments is now fluid. Once agents are writing more of the code, developers can finish implementation work faster than a part-time product manager and designer can keep specs ready. When that happens, the team pulls a developer into Definition, or reduces the implementation load until the next specs are ready. The flow reverses too; on a project with its checks and tooling dialed in, a product manager or designer can take a small fix through Implementation themselves. Product and business judgment move with the people, since the same priority, budget, stakeholder, and acceptance tradeoffs shape both kinds of work. How the Definition side splits up is covered in [Phase 1: Definition](./definition.md).

---

## Partnering with agents

The Definition and Implementation Phases both rely on agents in two distinct modes. _Agent delegation_ is handing off well-defined work and reviewing the output. _Agent partnership_ is working with an agent to inspect, critique, explain, pressure-test, or explore a direction before the team commits to it.

For example, in the Definition Phase, the team might delegate transcript cleanup, meeting-note drafting, or formatting a spec from clear source material. It may use agent partnership when reconciling contradictions, pressure-testing acceptance criteria, or deciding what a stakeholder comment implies for a product rule. In Implementation, the team might delegate bounded work, tests, and mechanical hardening. Then they might partner with an agent when shaping a plan, explaining a code change, validating the integrated feature, or probing architectural risk.

Agents multiply the output, but people must still own the discernment. The team and product owner decide what the product should do, what counts as acceptable, how scope, spend, timeline, stakeholder appetite, and reversibility trade off, and whether the change fits the larger product and codebase. Agents can run checks and flag risks. The team decides whether the work is ready to merge, release, or put back into the loop.


Next, [Phase 1: Definition](./definition.md) explains how raw signal becomes Knowledge Base updates, ready specs, and buildable work.

---

Source: /agentic-engineering/definition.md

# Phase 1: Definition

The upstream work that keeps the Implementation Phase moving.

---

The Definition Phase turns broad product intent into targets precise enough that an AI coding agent can build them without guessing. Its output is a spec: a description of the behavior to build, what counts as done, the constraints to preserve, and the questions or areas of discretion still open. Implementation works from those specs.

This set of work matters more now than it used to because building has gotten cheap and fast while Definition hasn't. A team used to fill the gaps in a spec with the shared context that never gets written down. An agent doesn't have that context. When Implementation and Definition fall out of step, the agent hits an ambiguous spec and fills the gap with a plausible guess that isn't what the team wanted. The resulting drift only surfaces an iteration later, as shipped work that misses the team’s intent.

The Definition Phase can begin as soon as the product has a rough shape: a working sense of who the users are, what constraints exist, and what the product first needs to do. That's enough to start asking which part to make buildable next, long before every product decision is settled. At Atomic Object, that early shape comes from a [Research, Design, and Planning (RDP)](https://spin.atomicobject.com/rdp-phase-benefits-deliverables/) engagement where teams run discovery and framing work; agile teams may know similar work as a Sprint 0 or a discovery phase. The engagement is distinct and up front, and a project may run another at a major turn, such as a new release or feature set. Its output becomes the Product Backlog, where an item sits at the right level of specificity for sequencing but is still too broad to build. Definition starts there, by narrowing it.

Definition narrows Backlog Items through a repeating pipeline: gather information, synthesize it into the Knowledge Base, and write or revise the specs that agents execute against during the Implementation Phase. Definition stays at least an iteration ahead of the build so that the agents don’t get blocked waiting for new specs.

That is the common path: a Backlog Item enters Definition and comes out as one or more specs. Definition also has to keep the Product Backlog supplied. Product direction keeps changing after kickoff; new information creates new Backlog Items, splits old ones apart, or changes what should be made buildable next. A discovery workshop, product backbone, journey map, technical investigation, or stakeholder decision session can all be Definition work when the output is a clearer Product Backlog. The rest of this chapter focuses on the spec pipeline, but the same pattern applies one step earlier: gather signal, synthesize it into the Knowledge Base, and write down the next product choices clearly enough for the team to sequence.

How much of this pipeline a feature needs varies, and the smallest work skips it entirely: a bug fix or a small, well-understood change can go straight to the Work Board as a task someone hands to an agent. At the other end is work where the wrong call is expensive to unwind: a contract, data model, authorization rule, or UX pattern that other work will depend on. That work cycles through gathering and synthesis before its spec is ready. Choosing how much certainty to buy before building is its own discipline, covered as Definition Strategies in [Managing Project Outcomes](./managing-project-outcomes.md). This chapter lays out the full pipeline; most work runs a lighter version of it.

Most of the Definition Phase is judgment work: which inputs to trust, how much to validate before building versus after, and when to gather more signal instead of betting on what you already have. That judgment is what keeps the spec accurate enough to hand to an agent.

---

## What happens in this phase

In the overall process, the team runs two iterations a week, each following the same pattern: 1 Planning, 2 Definition and Implementation work, then 3 Review. Everyone participates in the Planning and Review Phases as a full team. Definition-Phase work happens between those ceremonies, running concurrently with the Implementation Phase of the work defined in a previous iteration. At any given moment, the team is balancing Definition work across the current, next, and following iteration:

**Current iteration**

Answering questions, closing gaps surfaced during Implementation, reconciling findings at review.

**Next iteration**

Getting specs to "ready" so the next Planning Phase has a full load of committed work.

**Following iteration**

Gathering information and doing early synthesis for work two or more iterations out, keeping the Definition Pipeline and the Product Backlog full.

If all three horizons aren't getting attention, the system breaks. For example, if the team over-focuses on current-iteration support, the next Planning Phase goes underfed. If there’s too much focus on the future, the current Implementation-Phase questions go unanswered. Use the rhythm to keep those three horizons visible, whatever the exact meeting schedule becomes.

To pursue those horizons in a balanced way, each turn of the Definition pipeline runs the three activities below. Most Backlog Items take several turns before the specs are ready.

Information gathering Save raw context to repo handoff ↓ Synthesis Update knowledge base handoff ↓ Spec authoring Create or revise specs Ready for impl? yes Plan for iteration no need more context

**Information gathering:** Identify what context is needed, such as output from stakeholder sessions, design exploration, domain research, prototyping, and product evaluation, and source it. Save this context in a shared location the whole team can access.

**Synthesis:** Distill the raw material into durable decisions, facts, domain rules, and constraints, and record them in the Knowledge Base, the durable context store Definition and Implementation both draw from. Agents draft and cross-reference against what's already there while people make the calls that need judgment, like which source wins when two conflict, or whether a stakeholder’s aside is a real requirement or just thinking out loud.

**Specification authoring:** Create or revise specs to reflect what synthesis revealed. This is where work fans out: one synthesis pass can produce multiple independent spec tasks, each tracked and owned separately.

Each pass ends with the same question: Are the specs ready to implement? If they are, they go into an iteration. If not, the work loops back for more information gathering. The worked example at the end of this chapter shows how one team made that call.

The three activities also create natural handoff points. Work can move from one teammate to another, or one person can move through all three in a single pass. For example, a product manager captures a transcript in a stakeholder session on Tuesday. A tech lead synthesizes it into Knowledge Base updates on Wednesday. The designer updates the interaction spec on Thursday.

At any given moment, the team is making a judgment call: do they gather more information or process what they already have? Sometimes, the answer is driven by demand: Implementation needs specs for the next iteration, so the team synthesizes what it has and gets specs into shape. Other times, the answer is driven by supply: a stakeholder session is coming up, and the team prepares to gather information regardless of immediate downstream demand.

Information gathering activities

Stakeholder sessions

Close known gaps, get reactions, unblock decisions.

Design exploration

Visual mockups, interaction models, design system work.

Ideation and expansion

Expand sparse input into testable direction.

Technical spikes and specs

Developer-driven investigation and specification of architecture, data models, non-functional requirements.

Domain research

Map domain vocabulary, workflows, rules, and edge cases the product must preserve.

Product discovery and shaping

Workshops, product backbone and journey work, and milestone framing that replenish and reshape the Product Backlog.

Prototyping

Working software to test assumptions, routine now that building is cheap.

Product evaluation

Hands-on use of the built product to surface gaps and divergence from intent.

Friction surfacing

Identifying where tooling or automation could amplify team performance.

The mix of definition work shifts based on what the team needs. In a typical week, expect a rough allocation like this:

Information gathering ~30%

Synthesis and spec authoring ~30%

Running shared ceremonies ~20%

Evaluating ~20%

Time allocations are approximate and vary by project, work type, and point in the project.

Two habits keep the rhythm on track. The team should synthesize stakeholder input while the context is still fresh, and they should review active specs often enough to catch stale, contradictory, or missing coverage before the Planning Phase. The point is to keep ready work flowing without letting current-iteration support consume every Definition hour.

---

## Who picks up the work

The Definition pipeline is stable; the role assignments are not. The person closest to the uncertainty takes the next pass, and the work can move between people as the question changes.

A tech lead doing a technical investigation might gather information, synthesize it, and update a spec in a single pass. A product manager on a technically complex project might focus on stakeholder information gathering and hand off synthesis to someone closer to the technical domain. A designer might carry a visual improvement from product evaluation through UX spec updates, then guide an agent through the CSS and interaction changes.

Some responsibilities do stay with specific project roles.

The product manager takes point on business stewardship and information gathering from stakeholders. The designer leads information gathering through user research and design activities: product backbones, journey mapping, personas, UX design. The tech lead specifies data models, API contracts, and architectural decisions, and they get ahead of the structure upcoming work will need. When a feature will lean on a new subsystem or a key new module, that architecture-focused spec is written in the Definition Phase, ahead of the iteration that builds against it. The timing is a deliberate trade-off, since technical design that steers the system's structure is easy to get wrong under the build pressure of an iteration, where the pull is toward whatever unblocks the current feature.

Spec authoring lands with whoever owns the Definition work for that area, often the product manager or designer, sometimes a senior dev, often a pair. These are starting points. The type of work dictates which role picks up the next piece.

Those assignments show up in the working system. For example, a stakeholder session about seat limits might become a synthesis task to update seat-management context in the Knowledge Base, then split into spec-authoring tasks such as "write the feature spec Invite Members" and "draft the architecture spec Read-only Access." The Work Board makes the handoffs visible, and the Knowledge Base keeps the decisions available when questions arise during Implementation.

**When the Knowledge Base is stale or disorganized, the system degrades subtly.** Every stakeholder meeting, design decision, and review finding should be moved into the system quickly so that the team can use it confidently in the next iteration. The Knowledge Base needs to be structured so that agents and team members can find what's relevant when it matters, which means keeping less-relevant context out of it. When this problem shows up, work to improve synthesis workflows, spec health, decision tracking, or retrieval paths.

Whoever writes it, a ready spec pulls several perspectives into one place: product behavior, experience details, technical constraints, and the operating context around them. Some specs hold all four directly; others link out to a mockup, an API contract, a testing policy, or a deployment note. Specs must clear the same bar the rest of Definition aims at: an agent can build from it without inventing a product, design, or an architecture decision nobody made.

Pulling those perspectives together is the point, because each covers the others' blind spots. Product intent without technical constraints leaves gaps agents fill with guesses. Technical direction without product intent produces correct-looking software pointed at the wrong outcome.

---

## Example: Defining the Seats & Membership Backlog Item

Acme's Anvil platform has the Backlog Item Seats & Membership on its Product Backlog, one that's too vague for Implementation. The team knows organizations need to invite teammates, control who holds a paid editor seat, and remove access when someone leaves. The team does not yet know how seat reservation should work, how the invite flow chooses viewer or editor, or how read-only access fits a product where every member can edit today. One pass through the Definition pipeline turns the next buildable target into the feature spec Invite Members, something the team can plan and build next iteration, and surfaces the foundational work around it.

### Gather information

The product manager runs a 40-minute stakeholder session with the product owner and other team members to clarify how organizations should use Anvil as they grow: who belongs in an org, who can edit, how seats are paid for, and what outside collaborators should be able to do.

The product owner describes the main flow. An admin enters an email, the invitee gets a link, signs up or logs in, and lands in the organization. The designer asks whether every invitee should be able to edit, or whether this is where Anvil needs read-only access. The tech lead follows with the seat question: if an admin invites someone as an editor, does that consume a paid seat immediately?

In a handful of exchanges, Invite Members now touches membership, access levels, seat limits, and email delivery. It depends on an editor/viewer distinction Anvil does not have yet, a seat counter the product does not enforce yet, and an automated email path the current system has not built.

Right after the session, the team members who were in the room take 10 minutes to make sense of the session and record what the product owner actually confirmed, what was just thinking out loud, and what the exchange means for the work. The session ends with two records: a transcript, and a short note on what decisions were captured. That second record is the context synthesis works from.

Gathering information happens outside of team meetings as well. The designer sketches the invite dialog in Figma, which turns the main flow into something concrete and also surfaces questions the conversation skipped: what fills the dialog before anyone has been invited, and what the admin sees at the seat limit. Those are added to an open-questions list.

### Synthesize

Synthesis is a separate, solo task. A team member works with an agent to fold the two records into the Knowledge Base, referencing the invite-dialog sketch as well so the interface decisions inform the entries, not just the transcript. The agent drafts entries and cross-references what's already there. The person makes the calls the agent can't: which source wins when two conflict, whether a stakeholder aside is a real rule or thinking out loud, where each decision belongs, and how much detail to keep.

Two facts dominate what lands: (1) Anvil is global-edit today, so inviting someone as a viewer has nothing to grant until read-only access exists, and (2) invite delivery also needs an automated email path, and the product does not have one yet.

#### Decision recorded

Editor invites reserve a seat when sent. Viewer invites are free and do not count against the purchased-seat limit.

#### Domain model revised

The invite model gets an `asEditor` flag, and used seats count active editors plus pending editor invites.

#### Open questions logged

What the admin sees at the seat limit and the resend and revoke flows go onto the next agenda. The rollout rule for existing all-editor orgs gets settled while the architecture spec Read-only Access is written.

Synthesis also surfaces two specification tasks on the Work Board: (1) architecture spec: Read-only Access, the foundation; and (2) technical spec: Email Delivery, the interface the invite flow will call.

### Author specifications

Synthesis fans out into four specifications: two foundation specs that Invite Members depends on, and two feature specs the team can plan against.

#### Architecture spec: Read-only Access

Defines editor and viewer behavior across the product, plus the rollout rule: existing members stay editors, even if that leaves the org over its purchased-seat count until an admin adds seats or demotes people.

#### Technical spec: Email Delivery

Defines a sender interface that works with any email provider, background delivery with retries, and a pretend sender the team can build and test against before production email wiring lands.

#### Feature spec: Invite Members

Admin enters an email, chooses viewer or editor, sends an invite, and sees a Pending row in the invite table with Resend and Revoke. Editor invites reserve a seat; viewer invites do not.

#### Feature spec: Seat Accounting

Adds purchased-versus-used seat accounting, the editor/viewer access column, and the rule that seat grants are blocked when no seats are available.

### Ready for implementation?

The team separates what is buildable now from what needs more clarity to begin.

**Buildable now:** Invite Members, covering the email field, viewer/editor toggle, invite table's Pending row, seat counter, and the send action wired to the email sender interface. Email Delivery is buildable too: a background queue, retry handling, and a pretend sender that records emails instead of sending them.

**Foundational dependency:** Granting viewer access only works once Read-only Access lands. The invite UI can build against a stub while that foundation is implemented in parallel.

**Deferred behind the pretend sender:** Real provider wiring for email delivery. The pretend sender lets the team build, demo, and test the invite flow now; production wiring can land behind the same interface later.

In the Planning Phase, the team can scope two connected efforts for the iteration: Invite Members as one developer's priority path, and Email Delivery as another's. The interface is explicit enough for both to move together. A developer asks whether pending editor invites should appear in the seat counter; the designer checks the spec, records "yes," and the work moves forward with the decision written down.

1 Stakeholder session

→

3 Knowledge Base updates

→

4 Specifications split by type

→

2 Specifications ready for Planning

The output is a set of repo artifacts used in the Planning and Implementation Phases: Knowledge Base decisions, explicit open questions, revised specs, and architecture tasks that make the dependencies visible. The remaining uncertainty is tracked, but it no longer hides implicitly inside the feature.


Next, [Phase 2: Planning](./planning.md) turns ready Definition work into scoped, assigned Implementation work for the iteration.

---

Source: /agentic-engineering/planning.md

# Phase 2: Planning

How a team hands ready Definition work to Implementation and turns it into an iteration plan.

---

The Planning Phase is where work completed during a previous Definition Phase becomes assigned work to be completed during the next Implementation Phase. The Definition Phase’s output is synthesized context, Knowledge Base updates, and specs. But producing that output doesn't guarantee the team is ready for Implementation yet. A spec can seem finished and still hide a contradiction, a missing decision, or a dependency nobody scoped. Planning is where the team pressure-tests the specs together, settles what's still open, and commits only the work it actually understands well enough to delegate to agents.

Planning’s main new artifact is the effort: an agent-ready execution brief for a piece of assigned work, linked to its specs and kept current as a running log of the plan, checkpoints, decisions, deviations, and follow-ups. A good effort is what lets a developer hand substantial work to an agent while maintaining a record the team can review. A vague one gets paid for later, in review and rework.

---

## What happens in this phase

| Planning step | Activity | Tasks | Who executes |
| --- | --- | --- | --- |
| 1 | Review specs | Review the specs handed off from Definition work. Pressure-test them together to surface missing context, ambiguity, blockers, and architectural fit. | Team |
| 2 | Scope & assign | Lock in assignments and ownership. Set one clear priority path per developer, with smaller tasks assigned around it. | Team |
| 3 | Analyze & plan | Create efforts and Implementation plans for key work. Use agents to explore options and stage the work into checkpoints. | SoloCollab |
| 4 | Surface team concerns | Identify where plans affect other work: dependencies, shared patterns, validation needs, release sequencing, approval needs, and costly-to-unwind decisions. | Team |

---

## Running the ritual

### 01. Review specs

The first step is to review the in-scope specs as a team, so everyone understands the work being tackled and to ensure the specs are ready to build before committing to the work.

##### Participants

Full team

##### Goal

Align on what the work means and agree what is ready to build.

##### Steps

- Review to ensure the specs handed off from Definition work together.
- Surface missing context, ambiguity, blockers, and architectural fit.
- Agree what is ready to build. Route substantial gaps back to Definition instead of turning planning into live spec-writing.
- Check the structural work the queued specs imply. Most of it should already be settled by Definition’s architecture-focused specs, so planning verifies that direction rather than designing new structure. Build what can be decided and executed inside the iteration, and break out anything too big to resolve or too cross-cutting to fit a single feature's delivery, routing it back to Definition.

##### Key questions

- Is this spec clear enough to build from?
- Are there unresolved questions that would stall Implementation?
- Can the structural decisions here be made and executed this iteration, or do they need to break out as their own additional Definition work?

##### Output

A set of specs the team agrees are ready to build, with gaps routed back to Definition.

### 02. Scope & assign

The team locks in who's doing what for the iteration. Smaller tasks keep the developer productive while longer agent runs are in flight.

##### Participants

Full team

##### Goal

Each developer leaves with one clear priority path and supporting work around it.

##### Steps

- Lock in scope and assignments for the iteration.
- Mix complex work with smaller, clear tasks for each developer.
- Schedule the Compounding Work the team has prioritized: stronger specs, better examples, agent instructions, seeded patterns, validation automations. Feature work crowds it out otherwise.
- Right-size tasks to fit the iteration. Work taken on at the start should be completable by the end, with mergeable progress along the way.
- If a developer has two priority paths, both should be bounded enough to manage deliberately.

##### Key questions

- Can each developer articulate what would merge right now? If not, the task is probably too large.
- Is there Compounding Work in the mix, or is the iteration all feature work?
- Can everything taken on be completed by the end of the iteration?

##### Output

Clear assignments with one priority path per developer, smaller tasks around it, and Compounding Work explicitly scheduled.

### 03. Analyze & plan

Analysis is the main opportunity for human judgment before implementation begins. The developer decides the approach, risks, and strategy before delegating substantial work to agents. Those decisions become the agent's working constraints. When they are vague, the team pays for the ambiguity in review and rework. Without a written plan, the team ends up reconstructing the work from chat history.

##### Participants

Solo or collaborative, depending on complexity

##### Goal

Each developer has a written plan that gives agents usable constraints and gives the team a review record.

##### Steps

- For each piece of assigned work that needs planning, create or update one effort: the working document for scope, the Implementation plan, execution notes, deviations, and closure evidence.
- If the approach is unclear, use the agent to inspect relevant code, propose alternatives, and narrow to a direction. If it's already forming, start with a rough plan and pressure-test it with the agent.
- Write the Implementation plan: the proposed approach and how the work breaks down, especially for complex, risky, or high-stakes work.
- Walk through the plan and look for where it could break: structure, architecture, non-functional requirements, validation, approvals, release strategy.
- For complex, risky, or high-stakes work, pull in a pair before team review to challenge the plan. Break the work into stages with human review points, and review key design elements before broader Implementation begins.

##### Key questions

- Are the agent's constraints specific enough, or will the team pay for ambiguity in review?
- Where could this plan break?
- Does this work need a pair before team review?

##### Output

A written effort with an Implementation plan that agents can execute against and the team can review.

### 04. Surface team concerns

Before Implementation begins, the team checks where plans affect one another. Not every plan needs full group review. Focus attention on the ones where one developer's work changes another's path.

##### Participants

Full team

##### Goal

Find cross-work impacts and agree on what must be coordinated before execution begins.

##### Steps

- Walk through key plans and surface dependencies, shared patterns, architecture or infrastructure choices, approval needs, validation needs, and release sequencing.
- Identify where one developer's plan changes another's path.
- Agree on what must be coordinated before execution and assign follow-ups.

##### Key questions

- Does any plan depend on or conflict with another?
- Are there shared patterns or architecture choices that need agreement?
- Who owns each coordination follow-up?

##### Output

Alignment on what must be coordinated before execution begins, with clear owners for each follow-up.

---

## Example: Planning the Invite Members and Email Delivery efforts

Definition hands Planning two ready specs from the Backlog Item Seats & Membership: the feature spec Invite Members, and the technical spec Email Delivery that the tech lead wrote ahead of the iteration. Here's how one session turns them into assigned, agent-ready efforts.

Running example · Invite Members and Email Delivery efforts

**Review specs (whole team).** Reading the two specs side by side, the team hits a contradiction. The Invite Members feature spec assumes the email goes out the instant the admin sends it, with the screen confirming right away; the Email Delivery technical spec puts delivery a moment later, in the background. They settle it in the room: the invite table's Pending row will show the invite moving through sending, then sent or failed, with a resend. Everything else in both specs is answerable on the page, so nothing routes back to Definition and both are cleared to build.

**Scope & assign (whole team).** The Invite Members effort fits in one iteration, so it goes to one developer as their priority path; the Email Delivery effort goes to another. The admin "Sent emails" view is small enough to ride along with Email Delivery as Compounding Work instead of its own assignment. Both developers also pick up a lighter task or two to stay productive while long agent runs are in flight.

**Analyze & plan (each developer, with an agent).** Each developer opens an effort and drafts the plan with an agent. The Invite Members effort lays the build out as checkpoints to review before moving on: schema, seat reservation, modal, permissions, then the invite table's Pending row. It also names the one risk worth deciding up front: two admins accepting into the last open seat at the same moment, handled by checking and claiming the seat in one step, so only one of the two can win. The Email Delivery effort covers three things: the sender interface, background sending, and a pretend sender that records messages instead of actually sending them. Planning that pretend sender raises a testing question: how does anyone confirm an email really went out? The answer becomes the staging "Sent emails" view: every message is kept and displayed on an admin screen, so anyone can verify an invite was sent.

**Surface team concerns (whole team).** Back together, the team looks for places the two efforts collide. There's one: Invite Members calls the email module, so both developers can't define that interface on their own. They agree on its exact shape on the spot, and both efforts build to it. That settles the order of work too; the module can grow behind the interface while Invite Members builds against it and demos on the pretend sender, so neither developer is blocked on the other.

**Output.** Two agent-ready efforts, an agreed module interface, the sending-to-sent behavior for the invite table's Pending row pinned down, and a staging "Sent emails" view scheduled. Invite Members and Email Delivery can be built at the same time.


Next, [Phase 3: Implementation](./implementation.md) shows how developers execute that iteration work with agents while preserving judgment and quality.

---

Source: /agentic-engineering/implementation.md

# Phase 3: Implementation

How a team delegates to agents, executes, and validates code work.

---

A developer and an agent can stand up a working feature in an afternoon. That's the easy part. The hard part is production software a team understands, trusts, and keeps building on. The Implementation Phase is how a team turns ready specs into exactly that, as fast as agents now allow.

Even when an agent writes most of the code, developers still own the result. They keep that ownership by overseeing the plan that goes in, the validation that comes out, and the final call on whether the work is safe to merge. Two habits make this possible. The developer shapes the work before handing it off, and the developer stays close to the product as the code comes together. Together those habits let the team keep ownership while agents do most of the typing.

Some implementation work is simple enough for a short brief: fix a bug, make a small refactor, or adjust existing behavior. The agent can work while the project's normal checks catch regressions: tests, review agents, and developer review. Other work needs shaping first. If the task depends on a product, design, data, or architecture decision the team has not already made, an agent should not be the one to make it. Skipping that step moves the decision into an agent session with less context than the team has. This chapter focuses on complex tasks, where the developer has to shape the work before handing it off.

---

## What happens in this phase

Because writing code is no longer the bottleneck, a complex task is usually broader than a single development task would have been before agents. Each one moves the system toward an updated specification. The default workflow below gives that work a shape. It establishes the foundations early, creates clear points to merge along the way, and ends by explaining, validating, and reconciling what got built, so the team's understanding keeps pace with how fast the agents generate code.

The workflow below is a starting point, not a fixed procedure. Teams should adapt it as they learn. The core pattern holds regardless. Make the important decisions early, then scale Implementation against them.

Reopen on findings

Shaping → Building → Validating →

Shape

Establish foundations

Key design decisions: schemas, APIs, module boundaries

Solo

Critique

Pressure-test the design

Walk through trade-offs with a critique prompt or a pair before scaling up

Solo Collab

Build

Build the next bounded part

Implement against the plan; pause for human judgment

Solo

Harden

Check and commit

Run review agents, get tests green, address gaps, commit what is ready

Solo

Explain

Walk through what changed

Agent walks through the work; the pair builds the shared picture the stress-test will need

Solo Collab

Validate

Stress-test the whole

Exploratory testing, lateral review, NFR check, catch what earlier passes missed

Solo Collab

Close

Reconcile and deliver

Reconcile with spec, back-port intentional deviations, capture follow-ups, merge

Solo

Repeat while building

The workflow moves between the two ways of working with agents from the [Overview](./overview.md). In agent delegation, the developer hands off well-defined work and then checks the result. In agent partnership, the developer works alongside the agent to inspect, critique, and pressure-test a direction. **Shape**, **Build**, and **Harden** run mostly on delegation. **Critique**, **Explain**, and **Validate** run on partnership.

The first two stages set up everything that follows. **Shape** begins from the [effort](./planning.md), the execution brief written during Planning, where the approach, risks, and plan are already on the page. From that brief the developer drafts the foundational choices the rest of the work depends on, such as schemas, interfaces, and module boundaries. **Critique** then pressure-tests those choices with a pair or a review agent before the team scales Implementation against them.

**Build** and **Harden** run as a loop. The developer builds a bounded part, checks it, reads what failed, fixes the gaps, and stops at the next review point. Over time, teams can automate more of that loop so agents keep going without a developer typing every prompt. That only works when the agent has reliable tests and review checks to read, limits that prevent it from spinning forever, and a clear handoff when a human decision is needed.

One common variation changes how much **Harden** carries. Some teams ship each finished chunk as a stacked pull request, a chain of small PRs where each one depends on the last. In that mode **Harden** does more. The developer still leans on agents for tests, review, cleanup, and commit prep, but the developer also shapes each checkpoint into a reviewable increment, one with clear scope, passing checks, a useful description, and an explicit note about what later PRs will depend on.

Once building wraps, the work shifts back to partnership. **Explain** comes first. The agent walks through what changed, including its design choices, where it deviated from the plan, and any side effects it noticed, so the developer holds a clear picture before stress-testing begins. **Validate** comes next and exercises the feature as a whole. The developer runs exploratory testing across the full surface, thinks through what building in pieces might have missed, and checks the result against the non-functional requirements and testing strategy set during Planning. Anything **Validate** turns up reopens **Build** rather than landing on a follow-up list. The start and end of a complex task each demand more human attention than the middle does.

A second developer earns a place at **Critique**, **Explain**, and **Validate**. Pull one in when a decision will shape everything after it, such as a schema or API boundary, security-sensitive logic, an unfamiliar integration, a silent failure mode, or any work the lead developer can't yet explain clearly. **Build** and **Harden** stay one-developer work, because the agent context shifts too fast there to share.

### The developer as scheduler

Once the developer starts delegating, the developer is tracking several things at once. What the feature should become, what the agent is doing right now, what CI and review are reporting, and which decisions still need a human. Carrying that load, the developer has to guard enough focus to keep making good calls. In practice that means keeping one priority path easy to reload, clearing the next blocker as it appears, and taking on side work only when it won't make the main task hard to get back into. Use worktrees or secondary checkouts when running work in parallel, and keep work-in-progress low.

CURRENT PRIORITIES COMPLEX ▲ TOP Checkout flow Review current work; start next part SIMPLE Complete and commit Merge reviewed work COMPLEX Reporting dashboard Start shaping the first part when room opens SIMPLE Add another signup field Update form, validation, storage, and tests pick DEVELOPER LOOP KEEP CONTEXT MOVING Wrap or start, then tee up the next agent run AGENT RUNS Agent works autonomously until review is needed TOO MUCH WIP? yes: monitor / guide active runs no: shift to next priority Core disciplines Rotate from clear handoff points Complex tasks get priority attention Keep WIP low When agents struggle or repeat mistakes → diagnose root causes → improve the system (tests, agent instructions, skills, refactoring)

As a rule, keep one complex task as the priority path you carry to completion. Pulling in a second one can make sense when the first has long agent runs that leave you waiting. Simpler tasks usually make better company, because they produce progress without competing for much attention. Finishing, committing, and merging already-reviewed work counts as priority work too, so don't let it sit while you start something new. Priorities also move as friction, follow-ups, or newly discovered work surface mid-execution.

### Closing a complex task

**Close** reconciles the finished work with the spec, back-ports any intentional deviations, captures follow-ups, and merges. A complex task is done when all of the following hold.

- The spec and Implementation agree.
- The task is validated with automated tests. Where tests alone aren't enough, the developer has verified the behavior by running the software.
- Important findings are fixed or captured as follow-up work.
- The developer can explain the design, the checks, the risks, and the merge and release judgment.

---

## Example: Building the Invite Members effort

The Invite Members effort arrives from Planning with its plan, its checkpoints, and one risk already flagged. Two admins might invite into the last open seat at the same moment. The plan says what to build. It doesn't settle how the underlying structures should be shaped. One developer directs agents through that shaping, holding onto the decisions that would be expensive to undo if an agent guessed wrong.

Running example · Invite Members effort

**Shape (delegation).** The developer starts by loading context the plan and specs don't carry, namely the architecture the developer is leaning toward and the non-functional requirements the design has to survive. The seat count has to stay correct as people are invited, promoted, and removed. The limit has to hold even when two admins act at the same moment. Email has to stay behind the sender interface, so real delivery can drop in later without touching the invite code. With those constraints stated, the developer has an agent lay out options for the core structures, covering how to represent invites, seats, and memberships, and how to track the seat count against the limit. The agent weighs the trade-offs and produces a first pass, an initial data model with the scaffolding around it. This is delegated work. A rich brief goes in, a first cut comes out, and nobody expects it to be right yet.

**Critique (partnership).** A delegated first pass rarely hits the goal on the first try, and Critique is where the developer and the agent work it out together. The developer runs the agent hard against its own design, asking where the design misses the requirements from Shape and where it breaks under load. The agent's first data model tracked the seat count as a stored number that each new grant increments. Pushed to stress that choice, the agent finds the hole itself. The stored number drifts when two grants land at once, so at the limit two admins could each invite into the last seat and both succeed, pushing the organization past its paid count. Seeing that concrete failure is what prompts the developer to act. The developer makes a fix the spec never spelled out. Instead of storing the count, the system now derives it from the records, counting active editors plus pending editor invites, and each grant becomes a single all-or-nothing transaction, so the second simultaneous attempt fails cleanly. The data model changes here, before any UI is built on top of it.

**Build and harden (delegation, looped).** With the data model corrected, the developer delegates the build one chunk at a time against the pretend sender, starting with the invite dialog. The agent works from the Figma design and writes its tests first, so every build is checked against those tests from the start. Hardening asks for more than green tests, though. The agent renders the dialog, compares it against the design, and iterates on the visual gaps it spots. Where the agent misses one, the developer points at the specific flaw until it's fixed, and each fix becomes a new regression test so the flaw can't return. Once the dialog holds up against both the design and the tests, the developer commits it and moves through seat reservation, the admin-only checks, and the invite table's Pending row the same way.

**Explain (partnership).** Before stress-testing, the developer has the agent evaluate the finished change cold, with fresh context, and walk through it. Together they review the architecture the agent settled on, how the pieces fit, and the Implementation details that matter most. The developer asks questions wherever the picture is thin and pushes past what changed to why it changed, such as why the seat count is derived rather than stored and why each grant is a single transaction. The goal is a developer who understands the design well enough to stand behind it.

**Validate (partnership).** Validate is Critique again, this time aimed at the whole delivered feature. The developer has the agent check the change for what tends to drift as a build grows, including the team's style guide, the architecture rules, and the non-functional requirements the feature was meant to hold. The two of them brainstorm where it could still go wrong. Then the developer uses the feature the way a real admin would, inviting a viewer and watching no seat move, inviting an editor and watching one disappear, hitting the limit, resending, revoking. The developer is feeling for whether the feature hangs together and makes sense to the person who would actually use it. Whatever that surfaces - a confusing empty state, or a seat that doesn't come back on revoke until the page reloads - feeds one more short build-and-validate pass to tighten the feature before merge.

**Output.** A merged increment whose seat count can't drift and whose limit holds under the race condition, fully tested and reviewed, handed over by a developer who can explain the design and the risks it survived rather than just point at passing checks.


Next, [Review](./review.md) explains how the team turns Implementation output back into shared judgment and process learning.

---

Source: /agentic-engineering/review.md

# Phase 4: Review

How the team reviews Implementation output and routes learning back into the process.

---

The Review Phase is the team's primary human quality check on what the Implementation Phase produced. Agents generate code faster than a team can absorb it, so the team's understanding falls behind what's been built, and that gap doesn't announce itself. Reviewing agent output every two or three days, rather than letting a week's worth pile up, keeps the team's experienced judgment caught up.

Review also feeds learning back into the process. Some findings become immediate fixes. Others become input for a future Definition Phase, such as a follow-up question, a spec refinement, or a piece of Compounding Work. Which one a finding turns into depends on product-owner priorities, the project timeline, and the product direction.

---

## Running the ritual

| Step | Activity | Tasks | Who executes |
| --- | --- | --- | --- |
| ↻ per item | Demo & explain it | For each completed effort or other substantial piece of work, the developer explains what changed, how it is structured, how it was verified, and what frictions slowed delivery. | Team |
| Probe it together | Ask questions, surface risks and implications, explore alternatives. Repeat until the team has reviewed every completed effort or substantial piece of work that needs team attention. | Team |  |
|  | Exploratory testing | Step back and exercise the iteration's combined output hands-on: end-to-end flows, integration between newly landed pieces, regressions, UX friction. Surface findings for remediation. | SoloCollab |
|  | Remediate or scope follow-ups | Break into solo or pair work. Fix low-hanging issues immediately; turn larger issues into follow-up questions, spec refinements, Compounding Work, or tasks for a future iteration. | SoloCollab |

The demo-and-probe pair repeats for each completed effort or substantial piece of work that needs team attention. Once the team has worked through everything, the team breaks off for hands-on exploratory testing of the combined output, then comes back together to fix issues, capture follow-up questions, write spec refinements, or scope the next iteration.

---

## Example: Reviewing the Invite Members effort

The Invite Members effort merged mid-iteration. At the iteration's Review Phase, the team looks at it together for the first time as a finished feature, alongside the rest of the Implementation output.

Running example · Invite Members effort

**Demo & explain it (whole team).** The developer who built Invite Members walks the team through it. Inviting someone as an editor consumes a seat, inviting someone as a viewer stays free, and the invite table's Pending row moves from sending to sent. The developer explains the decision that shaped the feature, why the seat count is derived rather than stored, and names the friction that slowed the work down. The slow part was confirming by hand, over and over, that invite emails actually went out in staging.

**Probe it together (whole team).** The team pushes on the parts the demo glided over. What does an invitee see if the admin revokes the invite after the invitee has already clicked it? What happens to a reserved seat if an editor invite simply expires? Someone asks whether two admins inviting into the last seat at the same moment is really handled, and the developer walks through the all-or-nothing seat check that closes it. Any question the team can't resolve on the spot gets written down to settle later.

**Exploratory testing (solo or pair).** Away from the demo, someone exercises the Invite Members flow against the rest of the iteration's output by hand, inviting an email that already belongs to a member, letting an invite sit until it expires, and inviting up to the seat limit and one past it. The limit holds. Two smaller problems surface, though. The message a user sees at the limit is vague about what to do next, and a revoked invite still shows a stale Pending row in the invite table until the page refreshes.

**Remediate or scope follow-ups (solo or pair).** Each finding routes to where it belongs. The vague at-limit message is a quick fix, so the team applies it and re-demos the same day. The stale Pending row needs more than a same-day fix, so it lands on the Work Board for the next iteration. The deeper question the probing raised, whether an expiring invite should auto-release its reserved seat, isn't this iteration's call, so it becomes a spec question routed back to Definition. The email-checking friction is a different kind of finding altogether. A one-off bug gets fixed, but a recurring drag like this gets tooled away instead, so the team logs it as Compounding Work for the harness.

**Output.** The whole team understands what landed. The small fixes are made or scheduled, one open question is back in Definition's queue, and one recurring friction is queued as Compounding Work for the harness.


Next, [Engineering Sustainable Speed](./engineering-sustainable-speed.md) explains how the team turns recurring frictions like the email-checking drag into harness investments that keep agents fast.

---

Source: /agentic-engineering/engineering-sustainable-speed.md

# Engineering Sustainable Speed

How a team keeps agent speed useful after the first fast build.

---

The process so far has been defensive in its relation to speed. Previous sections have instructed teams how to keep debt from creeping in by protecting team comprehension, planning before building, and reviewing continuously. This chapter is the offensive half: the work that makes the system and the agents better at the next task, so the speed from the first fast build compounds instead of eroding.

Watch it play out across two teams that adopt agents the same quarter. Both see the same early result: work that took a week now takes a day or two. Six months later, one team still moves at that pace and can still reason about what it has built. The other is slower than when it started, buried in code nobody fully understands, re-explaining the same corrections every session, waiting on a person every time the agent needs to check its own work. Both bought the same ability. Only one kept it.

What separated them is where they spent the time the early speed-ups saved. The first team put some of it back into the system that produces the speed: layered tests and review agents that catch what no person could at full volume, a harness that lets an agent finish a task and verify its own work, an architecture kept solid enough that every session builds in the same direction, and a habit of turning each recurring mistake into a check the system runs from then on. The second team spent all of it on new features and let the rest erode.

That reinvestment is Compounding Work, and it's the highest-leverage work on a project. It's the mirror image of technical debt. Choices you don't pay down accrue interest, and the balance grows; vibe coding is the credit-card version, fast now and more expensive every month you don't pay it off. Agents raise that interest rate, and they raise it on the earning side too: work that makes the system and the agents better at the next task pays back on every future delegation. Because an agent now does most of that work itself, it's cheap to make and the returns show up sooner.

Work like this used to be filed under developer experience (DevEx), where the cost could be hard to justify. Agents change that math, and open new opportunities that didn't exist before, which is why it gets a chapter of its own. Three investments make up that work, in the order this chapter takes them: the quality checks an agent runs to catch its own mistakes, the harness that gives it the context and tools to keep working on its own, and the design decisions only a person can make about the shape of the system. A final section names the signs that one of them has fallen behind.

---

## Quality pyramid

Every layer of quality in this process is a feedback loop, a signal that the code isn't doing what it should yet. The pyramid orders those signals by cost. The base covers every line and runs for free on every change. Each layer up is narrower and more expensive, until the apex, where human judgment does what no check can. The aim is to push as much as possible down into the cheap, automatic layers, so attention goes only where a person is actually needed. The shape predates agents. What changed is how far up agents can reach. Agents now run the middle and upper layers that used to need a person, including system-level checks, code review, and even specialized design critique.

Human Judgment targeted review, pairing, team review Specialized Review Agents security, conventions, design decisions Code Review Agents general review, spec compliance, architecture System & Integration Tests acceptance criteria, end-to-end flows Unit Tests implementation logic, invariants, edge cases Types, Linters & Formatters covers every line

### Checks the agent runs while building

The bottom layers are the feedback an agent reads while it works. The agent should practice test-driven development and add regression tests as it goes, so building the system and building the checks that measure it happen together. With a solid suite underneath, the agent can run much longer on its own, iterating until the task is done without breaking something else along the way.

### Checks that used to be too expensive

Agile in the age of agents changes the cost-benefit of system-level checks. A check that was too expensive to build for a human reviewer is cheap when an agent runs it hundreds of times. Agents lack the holistic judgment a person brings to review, so the team closes that gap with new automated checks. A visual regression test flags unintended UI changes against a stored baseline. A schema check proves an API contract holds end to end. A migration test runs the rollback. Each one used to be a judgment call someone made by eye, or skipped. Built as a check, it runs on every change and never gets tired.

### Review that used to need a person

Passing checks are only part of the story. Code review is still a bottleneck, and here, too, agents help. Code review agents catch bugs and, with tuning, enforce architecture rules and style conventions. They give the code a first pass that improves it before a human reviewer sees it, and they often find problems a person would miss.

Senior judgment can't be fully automated. But on any project a senior engineer checks the same concerns again and again, including non-functional requirements like performance and security, and the design decisions that have to fit the rest of the system. A specialized review agent runs a team-crafted prompt for one of those concerns, on demand or in CI. Any pattern an engineer keeps having to check can become an agent that checks it first. What's left at the apex is the work no agent can stand in for, the lateral thinking, taste, and judgment that come from context an agent doesn't have.

---

## Harness engineering

A coding model is general-purpose, and every conversation it has starts from a blank slate, with only the prompt and the files it reads along the way to go on. Give the model a task and it does something reasonable, but without the team's context it sometimes does the wrong reasonable thing, and someone catches it later. Harness engineering shapes that environment, so the agent comes in with more than it can infer on its own. It works in two ways.

- **Guidance:** what the agent knows about the system and how the team works.
- **Tools:** what the agent can do, beyond the capabilities it starts with.

### Guidance

When people wrote all the code, teams ran on tacit, tribal knowledge, the kind picked up in conversation and over years on a project. An agent doesn't pick any of that up. The knowledge the agent needs has to be written down and put in front of it while it's working.

- The Knowledge Base holds design and product intent.
- Code comments carry the rationale behind a specific implementation, so an agent working there understands why, not only what.
- Agent Skills package step-by-step instructions for specific tasks.
- AGENTS.md gives every coding agent the same top-level guidance for working in the system.

This is where most of the improvement happens. Each time an agent gets something wrong, needs a concept re-explained, or repeats a manual task, the team writes the answer into one of these places, so the next agent starts with it. Sometimes that's a procedure or a rule. Sometimes it's making something the team already wrote easier to find at the right moment.

This pays off well beyond code. Because Definition is usually the bottleneck on a project, a harness that speeds up definition work can be worth more than another gain in implementation speed.

### Tools

An agent isn't limited to the capabilities it ships with. The team can build it new ones, tools that run a precise, repeatable task or take an action the agent otherwise couldn't. The quality pyramid was one case of this, since a test is a tool the agent runs to check its own work. The same idea reaches much further.

**Increasing autonomy:** An agent gets stuck when a task needs an action it can't take. Out of the box it usually can't see the UI it just built, so a person has to notice the visual bug and point it out. Give the agent the Figma file, a [visual component harness](https://ladle.dev/) to render the component in different states, and browser automation, and the agent can check its own work against the design and fix it before anyone looks.

**Increasing reliability:** Agents follow instructions better every month, but they're fuzzy by nature. When a task has exactly one right way to run, code beats an agent at it, producing the same result every time, faster and cheaper. Wrap a task like deployment or schema validation in a tool, tell the agent when to reach for it, and that step stops being a place where things go wrong.

**Increasing versatility:** Coding agents are built for code, but a project needs more than code. Take a production incident. Normally a person pulls the logs, gathers the telemetry, and hands it all to the agent. Give the agent its own way to search logs and watch service health, and it can work the incident itself instead of waiting to be fed.

Running example · Compounding Work: Email Verification Harness

**The friction.** This one surfaced at Review of the Invite Members effort. Building and reviewing Invite Members meant confirming, over and over, that the emails actually went out. The staging "Sent emails" view the team added in Planning is fine for a person, but an agent debugging an email problem has to click through that screen or dig through the staging server to check a single message. It's slow, and it pulls the agent out of the loop every time the agent needs to verify its own work.

**The investment.** So the team creates Compounding Work: Email Verification Harness, a skill and a tool that read the staging mailbox directly. Now the agent can ask its own questions without opening a screen. Did the invite to alex@ go out? What link did it carry? Did a reminder fire? The agent can debug an email problem end to end instead of stopping to ask a person to look.

**Why it compounds.** The skill outlives Invite Members. Every later email feature, from off-boarding notices to billing receipts, inherits the same way to check itself, so verifying email stops being a step that needs a person and becomes something a session finishes on its own. One friction, turned once into a capability every future delegation can use.

### Orchestrating parallel work

The harness pays off one more way. Everything up to now improves a single delegation: one agent on one task, with a person reviewing the result. A rich enough harness raises the ceiling on that. A single agent tops out at what it can hold in one context and how far it can build in one continuous run without losing the plot. When the harness gives agents the context and tools to work on their own, and the checks let them confirm their own results, a developer can point a team of agents at a build no single agent could accomplish, like a refactor that cuts across the system or a feature that spans the whole stack. Each agent owns a piece of the same plan and runs the same checks, the pieces recombine into one change, and the developer reviews the assembled result. The developer's job shifts up a level of abstraction, from prompting agents to building workflows that organize and prompt teams of agents.

When the conditions are right, this is higher-leverage than any single speed-up in this chapter, because it multiplies the work in flight rather than making one stream of it faster. Those conditions are demanding. The Compounding Work this chapter has already described is a necessary baseline. Orchestration works best when the goal is engineering-heavy and the checks are objective: migrate an API, run a broad refactor, update a component family. It is a poor fit when the work depends on unsettled product rules or domain calls. Where those conditions hold, a week of sequential delegation can collapse into a day. Where they don't, pushing orchestration past its limits just multiplies work that never fits together, and keeping a person closer to the work is faster.

For a given build, a developer finds the edge of what orchestration can handle by working up to it. They run a small piece of the build first, watch where the agents go wrong, and strengthen the harness against what broke, whether that's a missing check or an instruction the agents kept ignoring. Once a run comes back clean, they widen the scope. Some early passes exist mainly to be thrown away: build a version, study where it falls short, and rebuild with the plan and the harness tightened. Each pass either widens what the agents can take on or shows the harness still isn't ready.

So orchestration sits at the top of the same hill the rest of the chapter climbs: the better the checks, the harness, and the design underneath, the more of a build a developer can run at once. None of that relaxes their ownership of the result. A larger build still has to be understood and validated by the person who directed it, the same obligation as a single delegation, now spread across more code. That makes comprehension the real ceiling on how far this scales: a developer can launch many agents but can only merge a build they can still stand behind and explain.

---

## Knowing when to slow down

The other levers in this chapter are things the team builds. This one is work the team has to keep choosing to do. Good design is a bet a person makes against their own future effort, such as a simpler structure, a data model where the invalid states can't arise, or abstractions that compose so the next problem is easier than the last. Each one costs effort now to save more later. People make these bets because they're the ones who would otherwise pay, and the incentive is their own finite time and attention. An agent has none of that. Work costs the agent nothing, so it has no reason to invest ahead. It solves the problem in front of it and moves on. Point it at enough problems and you get a system that works and is costly to change.

This is the oldest form of Compounding Work, and the one an agent won't produce on its own. It's the design quality of the system itself, which every later change either rides on or fights. Protecting that quality is part of protecting speed, because a system that's expensive to change is slow no matter how fast the agent works. That makes it the team's call, and leadership's in particular, to notice when throughput mode is making the system worse and to spend time on the design instead. The signs are familiar. The same workarounds pile up, every change touches more than it should, and nobody can hold the current shape of the system in their head. The clearest case is new ground with no pattern to follow yet, like a new project or a refactor that changes how the system is layered.

Spending time on the design means stepping out of scaled delegation and putting human capacity back on the hard parts. The team gathers context, weighs structural options, settles the constraints, and writes down enough direction for agents to follow. Sometimes that's one developer spending part of an iteration on architecture instead of features. Sometimes it's most of the team. It's the same work Definition does, turned inward at the system. Architectural uncertainty is definition work.

### What that work looks like

**Seed the structure deliberately.** Before delegating at scale, the lead developer or the team makes explicit choices about layering, module boundaries, interface design, and the key abstractions. Write the foundational code by hand to force those decisions. A few well-considered examples set the pattern agents follow for the rest of the project.

**Mob program around the key decisions.** Getting the whole team writing and reading the same code together is one of the fastest ways to build shared architectural intuition.

**Spike, iterate, then commit.** An agent can spike a structural approach quickly. Treat that spike as a draft to argue with. Reshape it, and make sure the team understands and agrees with the structure before building on it.

Once the structure is sound and the team can answer "does this match how we've agreed to build this system?", scaled delegation takes over again and the planning checks keep it on track.

---

## Addressing development failure modes

Development fails in recognizable patterns once agent output gets ahead of quality signals, planning, architecture, or team comprehension. The signals below are ordered by priority, because earlier signals affect the later ones. Work the list top-down. If quality checks are weak (signal 1), don't spend time on signal 7 until that's resolved. Earlier signals are more fundamental, and fixing them often resolves the later ones.

### 01. Quality gates are weak (Fundamental)

Everything in this process assumes reliable feedback. Types pass, tests are green, review agents flag real issues, and validation catches regressions. When any of the quality layers is flaky, missing, or slow, agent delegation volume turns into risk the team cannot see. Nothing else in this list matters until the feedback is reliable.

#### What you're seeing

CI is flaky or too slow to use. Tests have obvious gaps, where the spec describes a behavior but no check covers it. Review agents run inconsistently, or their output is noisy enough to ignore. Bugs reach production that an earlier quality layer should have caught. The team starts routing around the checks instead of fixing them.

#### What to do

Treat improving the feedback infrastructure as a higher priority than completing new feature work. For each missed issue, ask which layer should have caught it and strengthen that layer. The rest of the process leans on TDD during implementation, regression tests on bug fixes, and reliable review agents. Fix one quality layer at a time until the team trusts them all.

### 02. Planning is rushed or skipped (Fundamental)

Analysis during planning is the main opportunity for human judgment before implementation begins. When planning gets compressed, with missing efforts, missing implementation plans, or no team review, the agent works within weak constraints. Code still arrives quickly, but review becomes the place where the team rediscovers intent, untangles avoidable choices, and makes decisions that should have happened before delegation.

#### What you're seeing

Developers start tasks without efforts or implementation plans. Specs enter planning with unresolved questions that get pushed into execution, and you might hear "we'll figure it out when we get there." The team rushes to start building. Mid-iteration, work stalls waiting for decisions that should have been made up front.

#### What to do

Reduce iteration scope before reducing planning. If planning is skipped to fit more work in, the team is optimizing for output volume over its own process. Reinstate Analyze & plan as a non-negotiable step. If planning capacity is genuinely insufficient, cut scope or slow the iteration while preserving the analysis.

### 03. Architectural direction is weak (Foundation)

Agents replicate patterns. Without shared architectural direction, every developer and every session makes reasonable but inconsistent choices about layering, module boundaries, interface shape, and where the architecture is heading. The codebase works but it fights itself, and the problem grows the more the team delegates.

#### What you're seeing

Similar problems get solved differently across efforts. Reviewers disagree about what "fits the system." Agents reproduce whichever pattern they encountered last. Duplicate abstractions accumulate. Tests pass but structure drifts. Refactor cost rises faster than feature velocity falls.

#### What to do

Establish architectural direction before the team delegates to agents heavily. The practices for that are in [Knowing when to slow down](#knowing-when-to-slow-down). Capture that direction where agents will follow it, in code, in documentation, and in examples. Until the alignment is real, keep delegation narrower and increase human review around structural decisions.

### 04. Same agent mistakes keep recurring (System improvement)

Agents don't remember previous sessions. If the team keeps re-prompting the same corrections ("don't do X," "the convention here is Y," "remember that Z is required"), that knowledge exists only in people's heads. Every future session starts without that context until someone teaches the system what the team already knows.

#### What you're seeing

Review agents flag the same class of issue over multiple iterations. Developers paste the same corrections into conversations with agents. The codebase accumulates workarounds tied to the same few patterns the agent keeps getting wrong. The team starts saying "the agent keeps missing this."

#### What to do

Treat every recurring mistake as a signal to improve the system. This is the harness loop from [Harness engineering](#harness-engineering), applied as a fix. Add the missing context to agent instructions, seed the correct pattern with an example, or build a review-agent check for the issue class. Agents help build these improvements, so the investment is small. If the same correction has happened twice, bake it into the system before it happens a third time.

### 05. Developers can't explain what was built (Review)

The Explain step exists to prevent this. When it's skipped or too light, the demo becomes an agent walkthrough instead of an owner walkthrough. The developer misses why something was implemented a certain way. Review only catches what the team understands.

#### What you're seeing

Demos are hand-wavy. Stakeholder questions get "I'm not sure, that's what the agent did" as an answer. Design choices look arbitrary, and the developer can describe what the code does but misses why. When probing uncovers something unexpected, the developer struggles to tell whether it was intentional or a drift.

#### What to do

Rebuild the Explain step. Before a demo, have the agent walk through what changed, the design choices, the deviations from the plan, and the unexpected side effects, so the developer understands what was built. Pair on Explain if needed, and don't merge until the developer can answer "why this?" for the key decisions.

### 06. Specs aren't staying ahead (Upstream interface)

This signal shows up in implementation, but the failure is somewhere upstream. Planning runs into specs with gaps. Developers ask questions mid-iteration that specs should answer. Implementation stalls waiting on direction.

#### What you're seeing

Specs arrive at planning with visible gaps, including missing acceptance criteria, TBD sections, and unresolved questions in the spec body itself. The pattern repeats iteration over iteration. Developers ask questions specs should answer and stall waiting on definition-side people for responses. The bottleneck is spec readiness.

#### What to do

Raise it with the people doing definition work. Phase 1: Definition explains the upstream pipeline; see [What happens in this phase](./definition.md#what-happens-in-this-phase). Hold half-formed specs back instead of pushing them into implementation to fill the iteration. If definition is underfed, reduce appetite until the pipeline catches up.

### 07. Volume outpaces comprehension (Systemic)

Agents make it possible to ship more than any team can reasonably hold in its head. When volume outpaces comprehension, nobody knows the current state of the system, regressions span iterations, and the team loses the ability to reason about what it built. This is usually a downstream symptom of earlier problems.

#### What you're seeing

New features pile up and nobody remembers why they were built. Regressions emerge across iterations, traced to interactions nobody understood. Review skims because there is too much to look at properly. Onboarding someone new gets harder every iteration. The codebase grows faster than the team's mental model of it.

#### What to do

Slow the cadence, and run some iterations that need less feature work. Reduce appetite and invest in Compounding Work, such as patterns, examples, documentation, and review agents, that makes the system easier to reason about. This signal rarely has a single cause, so check the earlier signals first. If they all check out, the team has more capacity than the domain absorbs, and should invest in future capacity rather than output.


Next, [Managing Project Outcomes](./managing-project-outcomes.md) explains how the team budgets, projects, and replans when definition cost and velocity move.

---

Source: /agentic-engineering/managing-project-outcomes.md

# Managing Project Outcomes

---

A team is 6 weeks into a project. The product owner asks the familiar question: "Are we on track?" Before agents wrote most of the code, there were three answers: "Yes"; "No, and here's what we need to cut"; and "No, and here's how much more we need to spend."

When implementation was the cost center, scope and spend were the only two levers.

Agents add a third. Once writing code is the cheapest phase of the work, the variable that decides whether you're on track is how much certainty you buy before implementation. This section is about managing that investment.

That the cheapest phase moved is easy to see. An agent can build a working version of an idea to react to faster than the team can wireframe one. A single spec can cover what used to take several developer-sized stories. What's less obvious is that the other costs of building software did not vanish. Planning, validation, architectural shaping, and keeping the team's mental model current with what shipped all still cost real time. Some of that time now has to be engineered back in deliberately.

Because of this, specs carry more of that weight than they used to. They have to spell out details an experienced developer would once have organically gleaned during implementation, like the micro-decisions made at the keyboard or the follow-up questions asked across the room. And the work that needs the tightest specs tends to also need the most architectural shaping and the hardest validation, since being wrong is more expensive there.

Two questions follow, which the rest of this chapter is oriented around. First, for each piece of work, how much certainty should the team build in before implementation? Those choices are Definition Strategies. Second, how does the team manage scope across the Product Backlog now that definition cost is larger and implementation smaller? That's the scope work: budgeting definition cost against a moving velocity is how the team decides what it can actually deliver. The two answers together also give the team somewhere to turn when projection diverges from plan. Beyond cutting scope or spending more, they can adjust the definition investment on whatever work remains.

---

## Making the right trade-offs

Every piece of work in the Product Backlog asks the team to make a trade-off: they can either buy more certainty before implementation, or they can move faster and learn from the result. There are five ways to make that call, five Definition Strategies, running from heavy upfront specification to short build-and-react loops.

MORE DEFINITION WORK LESS Research-First Shaping Investigate first, then specify. Promote findings. High-Precision Spec Full specification before build. Primary signal required. Guided Increment Build a structured pass to create better signal. Ship and Refine Minimal shaping, ship, learn from use. Show-and-React Cheap bursts to force reactions that create signal. DEFAULT STRATEGY

Three dimensions help teams decide which fits a given piece of work best:

**Target precision**

How narrow is the acceptable outcome space?

**Information availability**

Can the team get the signal it needs before building?

**Reversibility**

How costly is it to change direction after building, beyond code rewrite cost?

The dimensions don't always line up on a single piece of work. Let the most constraining one drive the call: narrow acceptable-outcome targets with hard-to-reverse stakes pull toward the heavy end of the spectrum, while wide targets with easy reversibility push toward shipping and learning.

The information you do have also varies in its trustworthiness. Decisions from the product owner, user research, and production data are firm enough to spec against directly. Design explorations, technical spikes, and the team's own judgment calls move work forward but are bets, and the weaker the signal, the more a build-and-validate loop beats heavy specification.

### Definition Strategy chooser

Precision

Any Wide Medium Narrow

Info

Any Low Medium High

Reversible

Any Easy Hard

Reset

Choose criteria to highlight the likeliest fit.

#### Research-First Shaping

Best fit

Narrow target + missing knowledge

The bullseye is small but the team can't aim yet. Research-heavy: legacy review, data analysis, user interviews, domain expert sessions. Narrow spikes validate findings before broader implementation.

#### High-Precision Spec

Best fit

Narrow target + high info + hard to reverse

The team has the information; the work is making the spec precise and confirmed. Heavy on design, detailed specification, and stakeholder review iterations before building. Still correct when the outcome space is narrow and hard to reverse; just no longer the default.

#### Guided Increment

Best fit

Enough information to move forward with room to revise — or no other strategy clearly fits and the risk is tolerable

When the team has enough information to move things forward, build it. Circling back used to be costly; with agents handling the revise pass, a batch of feedback items in the next iteration becomes a focused update. Define guardrails and constraints; expect to adapt, refine, and enrich.

#### Ship and Refine

Best fit

Wide target + easy to reverse

Any reasonable implementation will satisfy. Minimal definition work: capture key requirements, don't over-specify. Build, ship, refine from real feedback.

#### Show-and-React

Best fit

Decision flow is blocked and the work is safe to unwind

Rapid, divergent prototyping to force reactions and surface direction. The builds are throwaway; what they generate is direction for a follow-up iteration with another Definition Strategy. Bound the experiment and expect to discard the result.

Show-and-React sits outside the normal matrix. It is the fallback when decision flow is blocked badly enough that the usual factors stop being predictive; ordinary low-information work still belongs in Guided Increment or Ship and Refine.

The default strategy is Guided Increment when risk allows. Teams should challenge high-precision specification by habit: extra upfront definition has to earn its cost, because a short, structured build-and-react loop often produces better signal faster. High-Precision Spec stays correct when the outcome space is narrow, hard to reverse, or expensive to misunderstand; it's just no longer the thing you reach for first. Do just enough definition to build something safe to learn from, put it in front of the product owner or users, and revise from the reaction.

The strategy is chosen per Backlog Item, one coarse product area inside the Product Backlog, when definition work on that item begins. Often the choice is already implied by earlier stakeholder alignment or the business context the team brings in. And different items in the same iteration can run different strategies: one in a Guided Increment build, another being High-Precision spec'd for the next iteration, a third in Ship and Refine.

### What this looks like by project context

The team chooses a strategy for each piece of work based on that work, not on the kind of project it belongs to. Project context still matters: different contexts pull definition work toward different strategies, and concentrate it in different places. The table below illustrates that pull across a few common contexts, showing how the same per-item choice adapts as the project changes.

| Project context | Where definition work concentrates | Definition strategy tendency |
| --- | --- | --- |
| Consumer-facing app | Branding, visual design, interaction patterns, accessibility, UX flows | High-Precision for visual design and brand; Guided Increment for application functionality |
| Rewrite / modernization | Mining source code for behavior, agent-assisted workflow walkthroughs, documenting behavior as assertions for automated testing | Research-First early; Guided Increment as understanding builds |
| Short-timeline build | Ruthless scoping, fast priority confirmation, thin but directional specs | Ship and Refine or Show-and-React; learn by building |
| Generic technical domain | Success criteria and approach selection for well-understood problems (search, geolocation, caching, notifications) | Ship and Refine or Guided Increment; define criteria over detailed specification |
| Complex data migration | Source and destination data models, mapping rules, edge-case enumeration, validation criteria | Research-First dominates because mapping rules cannot be guessed |
| Technical / headless system | Event storming, event modeling, SLAs, API contracts, integration behavior | Research-First and High-Precision; contracts and data models are hard to reverse |
| Ongoing maintenance | Bug reports, monitoring data, security vulnerabilities, dependency updates, keeping specs current with the running system | High-Precision for high-stakes changes; Ship and Refine for routine fixes and updates |

Definition work exists on every project; it just looks different. If a team does not see itself in the process, the next move is to translate the model to its project context, constraints, and decision paths, rather than treating definition as optional just because the project context is different.

### How much can the team safely delegate?

Before using velocity as a forecast, ask how much of this project can safely move through agents. Can CI catch regressions? Is the architecture clear enough that agents extend existing patterns? Does the product owner make decisions quickly enough for specs to stay ready? Is the domain explicit, or does every feature require interpretation? Strong answers let more work move through agents. Weak answers mean a person stays closer to the work.

These conditions are what Compounding Work improves, which is why the expected shape of the speed-up depends on where a project starts. A greenfield project with clean tests and a fresh architecture can delegate heavily almost immediately. A legacy project usually has to front-load harness work, like coverage, reliable CI, architectural cleanup, and written context, which back-loads the speed-up rather than removing it. A team that expects the legacy curve plans for a slower first month and a faster last one, and reads modest early velocity as the cost of admission rather than a verdict on the approach.

---

## Managing scope

Scope management starts at the Product Backlog level, not the task level. The team sizes Backlog Items with the product owner using points on the Fibonacci scale. Agile readers can think of these items as roughly epic-sized; the important point is that they describe product areas large enough for prioritization and release planning. The total budget tells the team and product owner whether they will finish on time; the relative sizes show where the project's real cost sits.

Sizing in points follows from the strategy choice rather than from agile habit. When the team picks a Definition Strategy for a piece of work, it is already making a rough cost estimate: how many rounds of stakeholder feedback the work will take, how deep the definition has to go before an agent can build safely, and how much of the build the team can hand off versus hold close. A High-Precision Spec on a hard-to-reverse contract implies more of all three than a Ship and Refine on a throwaway screen. The points record that judgment, not anyone's typing speed.

Each item typically starts as a single large estimate: a block of time the team expects to invest in that area. As definition work surfaces detail, the item usually breaks into smaller pieces the team can sequence across iterations and interleave with other priorities. Each delivered piece burns part of the item's budget, giving velocity signal well before the whole item closes.

That gives each Backlog Item a **micro-budget**. When one is pulled into an iteration, the points drawn cover the combined cost of definition and implementation work for that iteration's delivery. A 20-point item might burn 6 points of definition work before implementation even starts: the info gathering, synthesis, and spec authoring required to get it ready. That visibility is deliberate. Definition cost shapes what the team can deliver, and the budget has to reflect it.

The budget is an early warning. Its job is to show when the item is burning differently than planned, while there is still time to change the Definition Strategy. When an item burns faster or slower than the plan assumed, the gap is the cue to revisit the Definition Strategy on the work that's left, well before the points could project a finish date.

BACKLOG ITEM AS MICRO-BUDGET Authorization System 21 pts total 7 pts remaining C1 C2 C3 C4 C5 Week 1 Week 2 Week 3 Definition work (info gathering, synthesis, spec authoring) Implementation work (planning, implementation, review) Remaining budget Planned burn ~5 pts/week · Actual: 6 · 6 · 2 so far · faster than planned Divergence is the signal → revisit the Definition Strategy on remaining work

Velocity is how much the team is actually shipping, measured weekly or bi-weekly. Iteration-to-iteration throughput is too noisy for projection; patterns emerge over a few weeks. Compared to the estimates required to hit the timeline, observed velocity tells the team whether the budgets will hold. The gap between the two is the signal that drives replanning.

When velocity is running behind, the team can adjust more than scope and spend. A Definition Strategy sizes a piece of work across Definition and Implementation: a higher strategy buys heavier specification up front and usually pulls closer attention through the build too, more architectural shaping and tighter validation. So flexing the strategy down on remaining Backlog Items lightens definition and implementation together, which is what makes it a real lever rather than a paperwork change. That third move is **strategy flexibility**.

**Throughput varies, and early on it varies most.** The first week is often deceptively fast: agents handle technical setup well, and the pace dips once the work turns to domain rules and product decisions. From there, velocity climbs as the Knowledge Base improves, agent instructions and checks get stronger, and product-owner decisions move faster. Early projections should carry wider ranges. Week two may show a setup spike or a temporary dip, not the team's steady pace. Tighten the forecast once the team has a few real delivery cycles behind it.

That improvement is not free. Velocity builds because the team spends part of its capacity on Compounding Work: the investment that makes every later delegation faster and safer. Because it competes with features for the same budget, and features always feel more urgent, the team has to reserve capacity for it deliberately. A backlog that is all feature work moves fast for a few iterations, then slows as the system gets harder to reason about.

---

## Example: Managing scope

Scope rarely holds still for a whole project. Here's the Backlog Item Seats & Membership absorbing a mid-stream change, with the team reaching for the third lever rather than only scope or spend.

Running example · Backlog Item: Seats & Membership

**The scope change.** Halfway through Seats & Membership, another team ships an integration between Anvil and Spark, Acme's AI insights product. The product owner wants a new "Pro" seat: an editor seat that also unlocks Spark. It wasn't in the original scope, but the Pro seat is an important business driver, and now it is crucial for an upcoming launch.

**Two parts, two strategies.** The new work splits into two specs that don't need the same strategy. The Pro Seat Entitlements spec uses High-Precision Spec because it covers the entitlement and cross-team contract: what a Pro seat unlocks, and how Anvil checks Spark access across the boundary. It's a narrow target, expensive to reverse once another team builds against it. The Pro Seat Admin spec uses Guided Increment for the admin screen that picks the Pro tier. That work is wide and easy to change later. One product change, two different amounts of certainty worth buying.

**The squeeze.** Using High-Precision Spec for Pro Seat Entitlements costs in Definition and Implementation: heavier definition work, and a build that draws closer attention because the contract is hard to reverse and another team depends on it, with more shaping and tighter validation than the iteration budgeted for. Velocity is already a little behind. The two familiar levers are on the table: cut a remaining Backlog Item, or add a person for a few iterations. Both are real, and both cost something.

**The third lever.** Instead, the team changes the strategy on work already in flight. The Remove Members spec had been using High-Precision Spec because the product owner wanted careful rules for reassigning a departing member's content. The team downgrades it to Guided Increment: ship a basic remove that frees the seat, watch the product owner use it, and refine the rules from the reaction. Downgrading it frees capacity in Definition and Implementation at once: a lighter spec, and a build the developer can mostly hand to an agent. That is the room Pro Seat Entitlements needs.

**The trade.** Changing the strategy changes the product. Pro Seat Entitlements ships on the Spark team's timeline. The careful Remove Members behavior the product owner pictured moves to a later iteration. The product owner is buying a different feature on a different timeline. That is the trade, and making it visible is the team's job.

**Output.** The team absorbs the Pro seat without cutting a Backlog Item or adding a developer. One in-flight spec is reshaped, the budget holds, and the product owner sees exactly what changed.

---

## Addressing project failure modes

When projection slips, the cause is often upstream of the estimate: Planning is underfed, specs are stale, decision flow is weak, or the team is using the wrong Definition Strategy for the remaining work. Use the signals below before changing scope, spend, or strategy selection. The first signal is usually the most visible, but its root cause is often one of the later ones.

### 01. Engineering doesn't have enough ready work (Fundamental)

Definition exists to prevent this. If engineering is consistently underloaded at Planning, with too few specs ready and too little confidence to commit, something upstream has broken down. Everything else in this list is secondary until this is resolved.

#### What you're seeing

Planning produces a partial load. Engineers finish committed work mid-iteration and pull in lower-priority or unspecified items. The team starts doing discovery work during execution because it has more build capacity than ready work to build.

#### What to do

This is a symptom, not a root cause. Diagnose the upstream failure: misallocated capacity (signal 3), synthesis stalling before commit (signal 4), or unowned product-owner decisions. Fix the cause, not the symptom.

### 02. Planning feels like discovery (Fundamental)

Specs presented at Planning aren't ready. The team discovers gaps that should have been resolved in the Definition pipeline. Planning becomes a discovery session instead of a commitment point.

#### What you're seeing

Engineers ask questions the spec should answer. Whoever brought the spec to Planning says, "I'll find out," or "we haven't decided that yet." Multiple specs get deferred to the next iteration. The team leaves Planning without a full load of committed work.

#### What to do

Hold the spec back from implementation rather than pushing it half-formed. Find where the Definition pipeline stalled.

### 03. All work is current-iteration reactive (Capacity)

No one is shaping next iteration's work. The board shows everything "in progress" or "blocked," nothing in "shaping" or "upcoming." The three-horizon balance has collapsed into one.

#### What you're seeing

Everyone wearing a definition hat spends all their time answering developer questions, handling gaps surfaced during implementation, and reconciling review findings. There's no time left for information gathering or synthesis aimed at future iterations. Next Planning will be underfed.

#### What to do

The team is misallocating capacity: too many people in implementation hats, too few in definition. Pull a developer into definition work to help with the bottleneck, shift the product manager or designer's time toward next-iteration readying, or reduce engineering appetite until definition can get ahead again.

### 04. KB isn't building leverage (Knowledge infrastructure)

The Knowledge Base exists, but it isn't improving the team's ability to author specs or align agents. Key decisions either aren't being captured, or they're captured but never surface when they're relevant. The problem is structure and agent guidance, not volume.

#### What you're seeing

Agents lack the context to stay aligned with the product and technical constraints. The team re-explains the same material repeatedly. Important decisions exist somewhere, but they don't surface when needed. Specs are authored from scratch rather than building on accumulated understanding. The Knowledge Base is a filing cabinet as opposed to a helpful working system.

#### What to do

Capture the right material: key decisions with rationale, domain constraints, and behavioral expectations, rather than raw session transcripts alone. Make it findable when it matters: agent guidance, linked references in specs, structured context that surfaces automatically. Invest in the tooling and structure that makes the KB earn its maintenance cost.

### 05. Developers bypass the specs (Definition-to-engineering interface)

The Knowledge Base has become write-only. People contribute to it but don't consult it. Developers find it faster to ask the product manager a question than to locate and read the relevant spec.

#### What you're seeing

The product manager fields constant questions from the dev team. Specs exist, but the developers don't trust them to be current, can't find the right one, or find them at the wrong level of detail. This hurts more when agents are the implementers, because they can't ask the product manager and just work from whatever they find written down.

#### What to do

Check signal 4 first. Bypass is usually a symptom of a weak or hard-to-find Knowledge Base rather than a habit problem. From there, fix the specs rather than chasing the habit. If specs aren't trusted, they're probably stale, so run a spec-health review and commit to keeping them current. If they're not findable, restructure. If they're at the wrong level, adjust. The goal is specs that are faster to read than to ask about.

### 06. Reactions keep reframing (Decision flow)

Demos or reviews consistently produce reactions like, "That's not what I meant," rather than refinements. The specs are capturing the team's interpretation, not the product owner's actual intent. This is normal early on, but if it's still happening mid-project, information gathering isn't extracting real commitments.

#### What you're seeing

Stakeholders say, "Looks good!" in sessions, but they react differently when they see the built demo. The team interprets vague approval as commitment, then discovers the gap at demo time.

#### What to do

Shift to a more confirmatory Definition Strategy. Extract specific decisions in sessions instead of general reactions. Use prototypes or mockups to force concrete feedback before building. Surface the pattern as a delivery risk.

### 07. Pre-decided calls stack up unreviewed (Decision flow)

The team is using pre-decide-and-review, a legitimate tactic, but the review half is dropping off. Unconfirmed decisions accumulate.

#### What you're seeing

Specs are full of team-generated decisions with documented rationale, but the product owner hasn't validated most of them. The team feels confident; the risk is invisible until a significant call turns out wrong.

#### What to do

This signal is about protecting the review cadence. Make the accumulated unconfirmed decisions visible, and establish a confirmation rhythm. Batch them into stakeholder sessions before the pile grows.

### 08. One strategy for everything (Strategy fit)

The team found a groove and stopped choosing between Definition Strategies. Every feature gets the same treatment regardless of uncertainty, precision, or how costly rework would be.

#### What you're seeing

The team doesn't discuss how much definition a feature needs because the answer is always the same. This works until it hits something hard to unwind, such as an integration contract, a data model decision, or a UX pattern that anchors stakeholder expectations.

#### What to do

Make choosing a Definition Strategy a deliberate step when definition work begins on a Backlog Item. Ask what is uncertain, how precise the target needs to be, and how expensive it would be to reverse course after building. Different features should produce different answers.

### 09. Velocity isn't stabilizing (Systemic)

The team has been running for several iterations, but throughput is either flat or erratic. Velocity should increase as the Knowledge Base matures, agent alignment improves, and the team finds its rhythm. If that's not happening, something structural is off.

#### What you're seeing

Work regularly takes more than one iteration end-to-end. The team doesn't feel faster than it did three weeks ago. Items that should be routine still require heavy definition investment. Knowledge Base and agent-alignment investments aren't paying off.

#### What to do

This is a systemic signal: work through the earlier ones first to localize the cause. Common culprits: work items that don't fit inside a single iteration, a Definition Strategy creating precision blockers (signal 8), or Knowledge Base and agent-alignment investments that haven't started paying off (signal 4). If work items don't fit, break them down. If earlier signals all check out, recalibrate how much specification each piece of work needs.