Chapter 4 — Problem Frames and Observable Failure¶

Systems are built to prevent failures. If you can’t name the failure, you will design a system that optimizes the wrong thing—usually something socially convenient, like “alignment,” “visibility,” or “consistency.”

This chapter teaches the entry move of System Design Lens:

Locate the failure
Make it observable
Choose the correct problem frame
Refuse to proceed without evidence

The Failure This Chapter Prevents¶

Observable failure: systems are designed from abstract motivations rather than concrete breakdowns.

Symptoms:

A new process is introduced with no baseline (“we need to improve”)
Different stakeholders mean different things by the same goal word
Teams debate methods instead of diagnosing failures
“Success” is defined after the fact
The system becomes permanent even after the original issue disappears

Root cause:

Problem statements are written to be agreeable, not accurate.

What a Problem Frame Is¶

A problem frame is the boundary you draw around “what kind of failure this is” so that:

you don’t solve the wrong problem well
you don’t apply the wrong causality model
you don’t pick artifacts that can’t represent reality

Problem frames are not labels. They determine:

what evidence counts
what decisions matter
what object of control is realistic

In this book, failures tend to cluster into five frames.

The Five Failure Locations¶

Strategy failures¶

What breaks:

priorities drift
investment choices don’t match intent
“important” work can’t defeat urgent work

Observable symptoms:

frequent re-prioritization with no learning
roadmaps that change without triggering a decision review
teams building things that leadership later calls “not the goal”

Decisions commonly failing:

priority, investment, scope

Discovery failures¶

What breaks:

learning is slow or untrusted
teams build based on assumptions that aren’t tested
users are understood through proxy opinions

Observable symptoms:

months of build work with surprise outcomes
repeated “we thought users wanted…” postmortems
research artifacts no one uses in decisions

Decisions commonly failing:

diagnosis, investment, scope

Delivery failures¶

What breaks:

flow, predictability, quality, throughput

Observable symptoms:

chronic missed dates
work items stuck in-progress
quality debt accumulating faster than it can be paid down
firefighting as the default mode

Decisions commonly failing:

sequencing, repair, scope

Cooperation failures¶

What breaks:

interfaces, ownership, coordination across boundaries

Observable symptoms:

cross-team friction dominates cycle time
unclear ownership of systems, APIs, or outcomes
escalation replaces collaboration
“we’re blocked” becomes a permanent status

Decisions commonly failing:

ownership, sequencing, repair

Evolution / scaling failures¶

What breaks:

the system stops working when context changes
growth increases coupling and coordination cost

Observable symptoms:

practices that worked for one team fail at 5–10 teams
architectural boundaries erode
governance expands to compensate for lack of clarity
the organization becomes slow to adapt

Decisions commonly failing:

ownership, investment, repair

Observable Failure vs Abstract Dissatisfaction¶

Abstract dissatisfaction is language like:

“We need alignment”
“We need better execution”
“We need to move faster”
“We need clarity”
“We need accountability”

These phrases are not failures. They are requests for safety.

An observable failure is something that:

is repeatedly happening
could be witnessed by an outsider
has a measurable cost (time, money, risk, customer impact)
can be stated without moral judgment

The Observable Failure Statement format¶

Write it in 3–5 sentences:

Situation: where/when it occurs
Symptom: what repeatedly happens
Consequence: what it costs
Who is impacted: team, org, users
Frequency: how often / how long

Example (delivery frame):

“Over the last 6 weeks, items labeled ‘small’ routinely take 2–3 weeks to ship. Work sits in review and QA with unclear handoffs. This causes planned releases to slip and forces weekend stabilization. Engineers are increasingly reluctant to pick up work outside their area because it amplifies cycle time.”

The point is not perfection. The point is inspectability.

Why “Alignment” Is a Smell¶

“Alignment” is usually a symptom of one of these real failures:

priority is not explicit
scope boundaries are porous
ownership is unclear
sequencing dependencies are hidden
investment choices are not committed

“Alignment” becomes a goal when people don’t want to name the real decision, because naming it creates conflict.

In this book, “alignment” is acceptable only when you can complete:

“Alignment about __ decision, using _ artifact, under ___ constraint.”

If you can’t, drop the word.

Problem Frames Determine Causality Assumptions¶

This matters because the wrong causality model creates the wrong system.

Examples:

Delivery bottlenecks often require constraints & flow thinking (queues, WIP, bottlenecks).
Strategy ambiguity often requires feedback loops (hypotheses, metrics, learning).
Cooperation failures often require socio-technical thinking (authority, incentives, boundary clarity).
Scaling failures often require evolutionary thinking (selection pressures, drift, local adaptation).

If you choose a linear plan for a feedback-dominant problem, you will get false confidence and real surprises.

Evidence: What Counts, What Doesn’t¶

Evidence that counts¶

cycle times, queue sizes, defect rates
incident timelines
decision logs (or absence of them)
recurring escalation paths
handoff points and wait states
repeated reversals (“we decided X, then we decided not-X”)

Evidence that does not count (by itself)¶

“people feel misaligned”
“leadership wants visibility”
“teams are frustrated”
“communication is bad”

These can be useful signals, but they are not failure definitions.

A Simple Frame Selection Tool¶

When you’re unsure which frame you’re in, ask:

Are we failing to choose what matters? → Strategy
Are we failing to learn what’s true? → Discovery
Are we failing to deliver reliably? → Delivery
Are we failing at cross-boundary coordination? → Cooperation
Are we failing to adapt as we scale? → Evolution

Most real situations involve multiple frames, but you must choose the dominant one to avoid designing a system that “optimizes everything” and enforces nothing.

Misuse Model: How This Chapter Gets Misapplied¶

Misuse 1: Treating “observable” as “quantitative only”¶

Some failures are observable through consistent narratives and incidents even before metrics exist.

Correction: use incident examples and timelines as evidence until metrics stabilize.

Misuse 2: Over-framing and analysis paralysis¶

People keep refining the failure statement instead of acting.

Correction: timebox diagnosis. Your goal is a usable frame, not a perfect one.

Misuse 3: Choosing the frame that avoids conflict¶

Teams pick “delivery” because it feels technical, when the real problem is strategy or ownership.

Correction: ask “Which decision are we refusing to make?” That usually reveals the real frame.

The Non-Negotiable Rule Introduced Here¶

You may not select, adopt, or design a system until you can produce:

one Observable Failure Statement
one dominant problem frame
one decision type that is failing repeatedly

If you can’t do that, you don’t need a system. You need observation.

Exit Condition for This Chapter¶

Before moving on, write:

Your Observable Failure Statement (3–5 sentences)
The dominant frame (strategy / discovery / delivery / cooperation / evolution)
The primary decision type currently failing (priority / scope / ownership / sequencing / investment / diagnosis / repair)

You now have the minimum input required to do real system work.