Managing Project Outcomes

A team is 6 weeks into a project. The product owner asks the familiar question: "Are we on track?" Before agents wrote most of the code, there were three answers: "Yes"; "No, and here's what we need to cut"; and "No, and here's how much more we need to spend."

When implementation was the cost center, scope and spend were the only two levers.

Agents add a third. Once writing code is the cheapest phase of the work, the variable that decides whether you're on track is how much certainty you buy before implementation. This section is about managing that investment.

That the cheapest phase moved is easy to see. An agent can build a working version of an idea to react to faster than the team can wireframe one. A single spec can cover what used to take several developer-sized stories. What's less obvious is that the other costs of building software did not vanish. Planning, validation, architectural shaping, and keeping the team's mental model current with what shipped all still cost real time. Some of that time now has to be engineered back in deliberately.

Because of this, specs carry more of that weight than they used to. They have to spell out details an experienced developer would once have organically gleaned during implementation, like the micro-decisions made at the keyboard or the follow-up questions asked across the room. And the work that needs the tightest specs tends to also need the most architectural shaping and the hardest validation, since being wrong is more expensive there.

Two questions follow, which the rest of this chapter is oriented around. First, for each piece of work, how much certainty should the team build in before implementation? Those choices are Definition Strategies. Second, how does the team manage scope across the Product Backlog now that definition cost is larger and implementation smaller? That's the scope work: budgeting definition cost against a moving velocity is how the team decides what it can actually deliver. The two answers together also give the team somewhere to turn when projection diverges from plan. Beyond cutting scope or spending more, they can adjust the definition investment on whatever work remains.

Making the right trade-offs

Every piece of work in the Product Backlog asks the team to make a trade-off: they can either buy more certainty before implementation, or they can move faster and learn from the result. There are five ways to make that call, five Definition Strategies, running from heavy upfront specification to short build-and-react loops.

Three dimensions help teams decide which fits a given piece of work best:

Target precision

How narrow is the acceptable outcome space?

Information availability

Can the team get the signal it needs before building?

Reversibility

How costly is it to change direction after building, beyond code rewrite cost?

The dimensions don't always line up on a single piece of work. Let the most constraining one drive the call: narrow acceptable-outcome targets with hard-to-reverse stakes pull toward the heavy end of the spectrum, while wide targets with easy reversibility push toward shipping and learning.

The information you do have also varies in its trustworthiness. Decisions from the product owner, user research, and production data are firm enough to spec against directly. Design explorations, technical spikes, and the team's own judgment calls move work forward but are bets, and the weaker the signal, the more a build-and-validate loop beats heavy specification.

Definition Strategy chooser

Precision

Info

Reversible

Choose criteria to highlight the likeliest fit.

Research-First Shaping

Best fit

Narrow target + missing knowledge

The bullseye is small but the team can't aim yet. Research-heavy: legacy review, data analysis, user interviews, domain expert sessions. Narrow spikes validate findings before broader implementation.

High-Precision Spec

Best fit

Narrow target + high info + hard to reverse

The team has the information; the work is making the spec precise and confirmed. Heavy on design, detailed specification, and stakeholder review iterations before building. Still correct when the outcome space is narrow and hard to reverse; just no longer the default.

Guided Increment

Best fit

Enough information to move forward with room to revise — or no other strategy clearly fits and the risk is tolerable

When the team has enough information to move things forward, build it. Circling back used to be costly; with agents handling the revise pass, a batch of feedback items in the next iteration becomes a focused update. Define guardrails and constraints; expect to adapt, refine, and enrich.

Ship and Refine

Best fit

Wide target + easy to reverse

Any reasonable implementation will satisfy. Minimal definition work: capture key requirements, don't over-specify. Build, ship, refine from real feedback.

Show-and-React

Best fit

Decision flow is blocked and the work is safe to unwind

Rapid, divergent prototyping to force reactions and surface direction. The builds are throwaway; what they generate is direction for a follow-up iteration with another Definition Strategy. Bound the experiment and expect to discard the result.

Show-and-React sits outside the normal matrix. It is the fallback when decision flow is blocked badly enough that the usual factors stop being predictive; ordinary low-information work still belongs in Guided Increment or Ship and Refine.

The default strategy is Guided Increment when risk allows. Teams should challenge high-precision specification by habit: extra upfront definition has to earn its cost, because a short, structured build-and-react loop often produces better signal faster. High-Precision Spec stays correct when the outcome space is narrow, hard to reverse, or expensive to misunderstand; it's just no longer the thing you reach for first. Do just enough definition to build something safe to learn from, put it in front of the product owner or users, and revise from the reaction.

The strategy is chosen per Backlog Item, one coarse product area inside the Product Backlog, when definition work on that item begins. Often the choice is already implied by earlier stakeholder alignment or the business context the team brings in. And different items in the same iteration can run different strategies: one in a Guided Increment build, another being High-Precision spec'd for the next iteration, a third in Ship and Refine.

What this looks like by project context

The team chooses a strategy for each piece of work based on that work, not on the kind of project it belongs to. Project context still matters: different contexts pull definition work toward different strategies, and concentrate it in different places. The table below illustrates that pull across a few common contexts, showing how the same per-item choice adapts as the project changes.

Project context	Where definition work concentrates	Definition strategy tendency
Consumer-facing app	Branding, visual design, interaction patterns, accessibility, UX flows	High-Precision for visual design and brand; Guided Increment for application functionality
Rewrite / modernization	Mining source code for behavior, agent-assisted workflow walkthroughs, documenting behavior as assertions for automated testing	Research-First early; Guided Increment as understanding builds
Short-timeline build	Ruthless scoping, fast priority confirmation, thin but directional specs	Ship and Refine or Show-and-React; learn by building
Generic technical domain	Success criteria and approach selection for well-understood problems (search, geolocation, caching, notifications)	Ship and Refine or Guided Increment; define criteria over detailed specification
Complex data migration	Source and destination data models, mapping rules, edge-case enumeration, validation criteria	Research-First dominates because mapping rules cannot be guessed
Technical / headless system	Event storming, event modeling, SLAs, API contracts, integration behavior	Research-First and High-Precision; contracts and data models are hard to reverse
Ongoing maintenance	Bug reports, monitoring data, security vulnerabilities, dependency updates, keeping specs current with the running system	High-Precision for high-stakes changes; Ship and Refine for routine fixes and updates

Definition work exists on every project; it just looks different. If a team does not see itself in the process, the next move is to translate the model to its project context, constraints, and decision paths, rather than treating definition as optional just because the project context is different.

How much can the team safely delegate?

Before using velocity as a forecast, ask how much of this project can safely move through agents. Can CI catch regressions? Is the architecture clear enough that agents extend existing patterns? Does the product owner make decisions quickly enough for specs to stay ready? Is the domain explicit, or does every feature require interpretation? Strong answers let more work move through agents. Weak answers mean a person stays closer to the work.

These conditions are what Compounding Work improves, which is why the expected shape of the speed-up depends on where a project starts. A greenfield project with clean tests and a fresh architecture can delegate heavily almost immediately. A legacy project usually has to front-load harness work, like coverage, reliable CI, architectural cleanup, and written context, which back-loads the speed-up rather than removing it. A team that expects the legacy curve plans for a slower first month and a faster last one, and reads modest early velocity as the cost of admission rather than a verdict on the approach.

Managing scope

Scope management starts at the Product Backlog level, not the task level. The team sizes Backlog Items with the product owner using points on the Fibonacci scale. Agile readers can think of these items as roughly epic-sized; the important point is that they describe product areas large enough for prioritization and release planning. The total budget tells the team and product owner whether they will finish on time; the relative sizes show where the project's real cost sits.

Sizing in points follows from the strategy choice rather than from agile habit. When the team picks a Definition Strategy for a piece of work, it is already making a rough cost estimate: how many rounds of stakeholder feedback the work will take, how deep the definition has to go before an agent can build safely, and how much of the build the team can hand off versus hold close. A High-Precision Spec on a hard-to-reverse contract implies more of all three than a Ship and Refine on a throwaway screen. The points record that judgment, not anyone's typing speed.

Each item typically starts as a single large estimate: a block of time the team expects to invest in that area. As definition work surfaces detail, the item usually breaks into smaller pieces the team can sequence across iterations and interleave with other priorities. Each delivered piece burns part of the item's budget, giving velocity signal well before the whole item closes.

That gives each Backlog Item a micro-budget. When one is pulled into an iteration, the points drawn cover the combined cost of definition and implementation work for that iteration's delivery. A 20-point item might burn 6 points of definition work before implementation even starts: the info gathering, synthesis, and spec authoring required to get it ready. That visibility is deliberate. Definition cost shapes what the team can deliver, and the budget has to reflect it.

The budget is an early warning. Its job is to show when the item is burning differently than planned, while there is still time to change the Definition Strategy. When an item burns faster or slower than the plan assumed, the gap is the cue to revisit the Definition Strategy on the work that's left, well before the points could project a finish date.

Velocity is how much the team is actually shipping, measured weekly or bi-weekly. Iteration-to-iteration throughput is too noisy for projection; patterns emerge over a few weeks. Compared to the estimates required to hit the timeline, observed velocity tells the team whether the budgets will hold. The gap between the two is the signal that drives replanning.

When velocity is running behind, the team can adjust more than scope and spend. A Definition Strategy sizes a piece of work across Definition and Implementation: a higher strategy buys heavier specification up front and usually pulls closer attention through the build too, more architectural shaping and tighter validation. So flexing the strategy down on remaining Backlog Items lightens definition and implementation together, which is what makes it a real lever rather than a paperwork change. That third move is strategy flexibility.

Throughput varies, and early on it varies most. The first week is often deceptively fast: agents handle technical setup well, and the pace dips once the work turns to domain rules and product decisions. From there, velocity climbs as the Knowledge Base improves, agent instructions and checks get stronger, and product-owner decisions move faster. Early projections should carry wider ranges. Week two may show a setup spike or a temporary dip, not the team's steady pace. Tighten the forecast once the team has a few real delivery cycles behind it.

That improvement is not free. Velocity builds because the team spends part of its capacity on Compounding Work: the investment that makes every later delegation faster and safer. Because it competes with features for the same budget, and features always feel more urgent, the team has to reserve capacity for it deliberately. A backlog that is all feature work moves fast for a few iterations, then slows as the system gets harder to reason about.

Example: Managing scope

Scope rarely holds still for a whole project. Here's the Backlog Item Seats & Membership absorbing a mid-stream change, with the team reaching for the third lever rather than only scope or spend.

Running example · Backlog Item: Seats & Membership

The scope change. Halfway through Seats & Membership, another team ships an integration between Anvil and Spark, Acme's AI insights product. The product owner wants a new "Pro" seat: an editor seat that also unlocks Spark. It wasn't in the original scope, but the Pro seat is an important business driver, and now it is crucial for an upcoming launch.

Two parts, two strategies. The new work splits into two specs that don't need the same strategy. The Pro Seat Entitlements spec uses High-Precision Spec because it covers the entitlement and cross-team contract: what a Pro seat unlocks, and how Anvil checks Spark access across the boundary. It's a narrow target, expensive to reverse once another team builds against it. The Pro Seat Admin spec uses Guided Increment for the admin screen that picks the Pro tier. That work is wide and easy to change later. One product change, two different amounts of certainty worth buying.

The squeeze. Using High-Precision Spec for Pro Seat Entitlements costs in Definition and Implementation: heavier definition work, and a build that draws closer attention because the contract is hard to reverse and another team depends on it, with more shaping and tighter validation than the iteration budgeted for. Velocity is already a little behind. The two familiar levers are on the table: cut a remaining Backlog Item, or add a person for a few iterations. Both are real, and both cost something.

The third lever. Instead, the team changes the strategy on work already in flight. The Remove Members spec had been using High-Precision Spec because the product owner wanted careful rules for reassigning a departing member's content. The team downgrades it to Guided Increment: ship a basic remove that frees the seat, watch the product owner use it, and refine the rules from the reaction. Downgrading it frees capacity in Definition and Implementation at once: a lighter spec, and a build the developer can mostly hand to an agent. That is the room Pro Seat Entitlements needs.

The trade. Changing the strategy changes the product. Pro Seat Entitlements ships on the Spark team's timeline. The careful Remove Members behavior the product owner pictured moves to a later iteration. The product owner is buying a different feature on a different timeline. That is the trade, and making it visible is the team's job.

Output. The team absorbs the Pro seat without cutting a Backlog Item or adding a developer. One in-flight spec is reshaped, the budget holds, and the product owner sees exactly what changed.

Addressing project failure modes

When projection slips, the cause is often upstream of the estimate: Planning is underfed, specs are stale, decision flow is weak, or the team is using the wrong Definition Strategy for the remaining work. Use the signals below before changing scope, spend, or strategy selection. The first signal is usually the most visible, but its root cause is often one of the later ones.

Engineering doesn't have enough ready work Fundamental

Definition exists to prevent this. If engineering is consistently underloaded at Planning, with too few specs ready and too little confidence to commit, something upstream has broken down. Everything else in this list is secondary until this is resolved.

What you're seeing

Planning produces a partial load. Engineers finish committed work mid-iteration and pull in lower-priority or unspecified items. The team starts doing discovery work during execution because it has more build capacity than ready work to build.

What to do

This is a symptom, not a root cause. Diagnose the upstream failure: misallocated capacity (signal 3), synthesis stalling before commit (signal 4), or unowned product-owner decisions. Fix the cause, not the symptom.

Planning feels like discovery Fundamental

Specs presented at Planning aren't ready. The team discovers gaps that should have been resolved in the Definition pipeline. Planning becomes a discovery session instead of a commitment point.

What you're seeing

Engineers ask questions the spec should answer. Whoever brought the spec to Planning says, "I'll find out," or "we haven't decided that yet." Multiple specs get deferred to the next iteration. The team leaves Planning without a full load of committed work.

What to do

Hold the spec back from implementation rather than pushing it half-formed. Find where the Definition pipeline stalled.

All work is current-iteration reactive Capacity

No one is shaping next iteration's work. The board shows everything "in progress" or "blocked," nothing in "shaping" or "upcoming." The three-horizon balance has collapsed into one.

What you're seeing

Everyone wearing a definition hat spends all their time answering developer questions, handling gaps surfaced during implementation, and reconciling review findings. There's no time left for information gathering or synthesis aimed at future iterations. Next Planning will be underfed.

What to do

The team is misallocating capacity: too many people in implementation hats, too few in definition. Pull a developer into definition work to help with the bottleneck, shift the product manager or designer's time toward next-iteration readying, or reduce engineering appetite until definition can get ahead again.

KB isn't building leverage Knowledge infrastructure

The Knowledge Base exists, but it isn't improving the team's ability to author specs or align agents. Key decisions either aren't being captured, or they're captured but never surface when they're relevant. The problem is structure and agent guidance, not volume.

What you're seeing

Agents lack the context to stay aligned with the product and technical constraints. The team re-explains the same material repeatedly. Important decisions exist somewhere, but they don't surface when needed. Specs are authored from scratch rather than building on accumulated understanding. The Knowledge Base is a filing cabinet as opposed to a helpful working system.

What to do

Capture the right material: key decisions with rationale, domain constraints, and behavioral expectations, rather than raw session transcripts alone. Make it findable when it matters: agent guidance, linked references in specs, structured context that surfaces automatically. Invest in the tooling and structure that makes the KB earn its maintenance cost.

Developers bypass the specs Definition-to-engineering interface

The Knowledge Base has become write-only. People contribute to it but don't consult it. Developers find it faster to ask the product manager a question than to locate and read the relevant spec.

What you're seeing

The product manager fields constant questions from the dev team. Specs exist, but the developers don't trust them to be current, can't find the right one, or find them at the wrong level of detail. This hurts more when agents are the implementers, because they can't ask the product manager and just work from whatever they find written down.

What to do

Check signal 4 first. Bypass is usually a symptom of a weak or hard-to-find Knowledge Base rather than a habit problem. From there, fix the specs rather than chasing the habit. If specs aren't trusted, they're probably stale, so run a spec-health review and commit to keeping them current. If they're not findable, restructure. If they're at the wrong level, adjust. The goal is specs that are faster to read than to ask about.

Reactions keep reframing Decision flow

Demos or reviews consistently produce reactions like, "That's not what I meant," rather than refinements. The specs are capturing the team's interpretation, not the product owner's actual intent. This is normal early on, but if it's still happening mid-project, information gathering isn't extracting real commitments.

What you're seeing

Stakeholders say, "Looks good!" in sessions, but they react differently when they see the built demo. The team interprets vague approval as commitment, then discovers the gap at demo time.

What to do

Shift to a more confirmatory Definition Strategy. Extract specific decisions in sessions instead of general reactions. Use prototypes or mockups to force concrete feedback before building. Surface the pattern as a delivery risk.

Pre-decided calls stack up unreviewed Decision flow

The team is using pre-decide-and-review, a legitimate tactic, but the review half is dropping off. Unconfirmed decisions accumulate.

What you're seeing

Specs are full of team-generated decisions with documented rationale, but the product owner hasn't validated most of them. The team feels confident; the risk is invisible until a significant call turns out wrong.

What to do

This signal is about protecting the review cadence. Make the accumulated unconfirmed decisions visible, and establish a confirmation rhythm. Batch them into stakeholder sessions before the pile grows.

One strategy for everything Strategy fit

The team found a groove and stopped choosing between Definition Strategies. Every feature gets the same treatment regardless of uncertainty, precision, or how costly rework would be.

What you're seeing

The team doesn't discuss how much definition a feature needs because the answer is always the same. This works until it hits something hard to unwind, such as an integration contract, a data model decision, or a UX pattern that anchors stakeholder expectations.

What to do

Make choosing a Definition Strategy a deliberate step when definition work begins on a Backlog Item. Ask what is uncertain, how precise the target needs to be, and how expensive it would be to reverse course after building. Different features should produce different answers.

Velocity isn't stabilizing Systemic

The team has been running for several iterations, but throughput is either flat or erratic. Velocity should increase as the Knowledge Base matures, agent alignment improves, and the team finds its rhythm. If that's not happening, something structural is off.

What you're seeing

Work regularly takes more than one iteration end-to-end. The team doesn't feel faster than it did three weeks ago. Items that should be routine still require heavy definition investment. Knowledge Base and agent-alignment investments aren't paying off.

What to do

This is a systemic signal: work through the earlier ones first to localize the cause. Common culprits: work items that don't fit inside a single iteration, a Definition Strategy creating precision blockers (signal 8), or Knowledge Base and agent-alignment investments that haven't started paying off (signal 4). If work items don't fit, break them down. If earlier signals all check out, recalibrate how much specification each piece of work needs.