Lesson 06

Proxy-Based & Group Estimation Methods

When you can't estimate directly, estimate through something else. When you can't estimate alone, estimate together.

The Proxy Principle

Sometimes you can't directly estimate the effort for a software feature. The requirements are vague, the technology is unfamiliar, or the scope is just too large to reason about as a single unit.

Proxy-based estimation sidesteps this problem by estimating through a stand-in measurement—a "proxy"—that correlates with actual effort. Instead of asking "How many hours will this take?", you ask "How big is this relative to other things we've built?"

"The key insight of proxy-based methods is that relative sizing is much easier than absolute sizing. Humans are poor at saying 'this will take 347 hours' but excellent at saying 'this is about twice as big as that.'"
— Steve McConnell, Software Estimation

McConnell outlines several proxy-based approaches, each suited to different project contexts. We'll cover the major ones: fuzzy logic sizing, standard components, story points, and T-shirt sizing. Then we'll move into group estimation methods that harness collective intelligence.

Fuzzy Logic Sizing

Fuzzy logic sizing classifies every feature in your backlog into size buckets: Very Small, Small, Medium, Large, and Very Large. Each bucket has a historical average cost (in lines of code, effort hours, or story points) based on your team's past work.

The Rules of Fuzzy Logic

  1. Factor-of-2 minimum: Each bucket must be at least twice the size of the one below it. If "Small" is 200 LOC, then "Medium" must be at least 400 LOC. This prevents ambiguity at the boundaries.
  2. 20+ features required: The method relies on the law of large numbers. Errors in individual classifications cancel out, but only if you have enough items. With fewer than 20 features, the statistical averaging doesn't work reliably.
  3. Use historical averages: The LOC (or effort) per bucket comes from your own team's history, not industry benchmarks. What counts as "Medium" at your organization may be "Large" somewhere else.
Why factor-of-2? Why not factor-of-1.5?

Research shows that humans can reliably distinguish between items that differ by a factor of 2 or more. Smaller differences lead to frequent misclassification, which defeats the purpose of bucketing. The factor-of-2 rule is a practical threshold that balances granularity with classification reliability.

For example, if you have buckets at 100, 200, 400, 800, and 1600 LOC, an estimator can confidently say "this is more like the 400 LOC features we've built, not the 200 or 800 LOC ones." But if the buckets were 100, 150, 225, 338, and 507 LOC, the distinctions become much murkier.

How It Works

List all features
Classify each into a size bucket
Look up historical LOC per bucket
Multiply & sum for total estimate

Interactive Exercise — Fuzzy Logic Classifier

You're estimating features for an e-commerce platform. Classify each feature into a size bucket. Historical averages for your team are:

VS = 50 LOC S = 120 LOC M = 300 LOC L = 700 LOC VL = 1500 LOC

Standard Components

Standard components take the proxy idea further by using known component types as the unit of estimation. Instead of abstract size buckets, you count concrete things your team builds regularly:

The power of this approach is its simplicity: "This project has 12 reports, 8 data entry screens, 30 business rules, 25 API endpoints, and 15 database tables. Based on historical data, that's 12(3) + 8(5) + 30(2) + 25(1.5) + 15(1) = 188.5 staff-days."

When does standard component estimation work best?

Standard component estimation excels when your team builds similar types of software repeatedly—business applications, CRUD systems, report-heavy analytics tools. It's less useful for novel or research-heavy work where component types aren't well established.

The critical prerequisite is calibration data. You need at least a few past projects' worth of actuals to know that a "report" averages 3 days at your organization with your stack. Without this history, you're just guessing at the multipliers.

Story Points & Velocity

Story points are the dominant proxy-based method in Agile and iterative development. Unlike LOC or hours, story points measure relative complexity—how big a feature is compared to a reference story the team has agreed upon.

The typical scale uses the Fibonacci sequence: 1, 2, 3, 5, 8, 13, 21. The increasing gaps at the top reflect an important truth: the bigger something is, the less precisely we can estimate it. There's no "14" because at that scale, the difference between 13 and 14 is noise.

How Velocity Turns Points into Dates

Story points become useful for forecasting through velocity—the number of points a team completes per iteration:

  1. Size your backlog in story points (e.g., 150 total points remaining)
  2. Measure team velocity (e.g., average 25 points per 2-week sprint over the last 4 sprints)
  3. Divide: 150 / 25 = 6 sprints = 12 weeks
  4. Apply a range: optimistic velocity (30pts) gives 10 weeks; pessimistic (20pts) gives 15 weeks

The beauty is that this self-corrects: as the team completes sprints, velocity updates automatically and the forecast adjusts. No one ever has to convert points to hours.

The "story points are not hours" trap

A common dysfunction is managers asking "but how many hours is a story point?" This defeats the entire purpose. Story points work precisely because they decouple relative sizing (which humans are good at) from absolute time estimation (which we're bad at). The velocity mechanism handles the conversion implicitly.

If a team says 1 point = 4 hours, they've just reinvented hourly estimation with extra steps—and lost all the benefits of relative sizing.

T-Shirt Sizing

T-shirt sizing (XS, S, M, L, XL) is the simplest form of proxy-based estimation. It's intentionally low-fidelity—perfect for early-stage estimation when requirements are fuzzy and you need a rough order of magnitude, not a precise plan.

The key advantage: speed. A team can T-shirt size 50 features in under an hour. The key limitation: you need to map the sizes to effort multipliers to get anything actionable, and those multipliers are only as good as your historical data.

Interactive Exercise — T-Shirt Sizing

You're estimating a project management tool. Size each feature using T-shirt sizes. We'll map them to effort afterward.

Think in relative terms: an XS feature is trivial (config change, text update). An XL is a major new capability requiring multiple weeks of work.

Choosing a Proxy Method

Each proxy method has its sweet spot. Here's a quick comparison:

Method Best For Min. Items Precision
Fuzzy Logic Large backlogs, waterfall/phased 20+ Medium
Standard Components Repetitive business apps 5+ Medium-High
Story Points Agile/iterative teams 3+ sprints of history Self-correcting
T-Shirt Sizing Early/rough estimation Any Low

Group Estimation Methods

All the proxy methods above can be done by a single estimator. But McConnell argues forcefully that group estimation consistently outperforms individual estimation. Why?

Group Review: The Simplest Form

The most basic group method is a structured group review: one person creates an initial estimate, then presents it to the team for critique. The group identifies missing tasks, overlooked risks, and optimistic assumptions. The estimate is revised based on feedback.

Even this simple step—having others review your estimate—typically reduces estimation error. But the more sophisticated methods below do even better.

Wideband Delphi

Wideband Delphi is the gold standard for group estimation. Developed at RAND Corporation and adapted for software by Barry Boehm, it's a structured process that extracts and converges expert opinions while minimizing social biases.

Research shows Wideband Delphi reduces estimation error by approximately 40% compared to individual expert estimation. That's an enormous improvement—one of the largest gains from any single estimation technique.

The Process

  1. Coordinator presents the feature/project to be estimated, with all available specs and context
  2. Each expert estimates independently—no discussion, no peeking. They write down their number in private
  3. Estimates are revealed simultaneously (often plotted on a wall chart)
  4. Outliers explain their reasoning—the highest and lowest estimators share what they know that others might not
  5. Group discusses risks, assumptions, and new information surfaced by the outliers
  6. Re-estimate independently, incorporating what was learned
  7. Repeat rounds 3-6 until estimates converge (typically 2-3 rounds)
Why "Wideband"? What's the original Delphi?

The original Delphi method (1950s, RAND Corporation) was fully anonymous and asynchronous. Experts never met face-to-face. "Wideband" Delphi adds in-person discussion after the reveal—more communication bandwidth. This hybrid retains the independence of initial estimates while adding the richness of group discussion.

Why does independent estimation matter so much?

The independent estimation step combats anchoring bias. If the most senior person says "I think it's about 3 months" before anyone else speaks, everyone else's estimate will gravitate toward 3 months. By estimating independently first, each person's genuine assessment gets captured before social dynamics can corrupt it.

Studies show that even hearing a single number before estimating can shift subsequent estimates by 20-40% toward that anchor, even when the anchor is obviously irrelevant (like a random number from a spinner).

Interactive Exercise — Wideband Delphi Simulation

You're participating in a Wideband Delphi session to estimate a feature: "Add real-time collaborative editing to the document editor."

The team: You, plus 4 other engineers. The coordinator has shared the requirements. Estimate in person-weeks.

Planning Poker

Planning Poker is the Agile descendant of Wideband Delphi, popularized by Mike Cohn. It applies the same core principles—independent estimation, simultaneous reveal, outlier discussion, re-estimation—but adapts them for the sprint planning context.

How Planning Poker Works

  1. Each team member has cards with the Fibonacci values: 1, 2, 3, 5, 8, 13, 21
  2. The product owner describes a user story
  3. Everyone selects a card face-down
  4. All cards are flipped simultaneously—preventing anchoring
  5. If there's significant disagreement, the highest and lowest explain their reasoning
  6. The team re-votes until convergence

Why Simultaneous Reveal?

This is the single most important mechanism in Planning Poker. If estimates are shared sequentially, anchoring bias dominates: the first number spoken heavily influences all subsequent estimates. Simultaneous reveal ensures each person's independent judgment is captured.

It also surfaces valuable disagreement. When one developer says "3" and another says "13", that's not a problem—it's information. They're seeing different things in the story, and the discussion that follows often reveals missing requirements or hidden risks.

Interactive Exercise — Planning Poker

Estimate this user story: "As a user, I want to export my dashboard data as a PDF report with charts and tables, styled to match our brand guidelines."

Your team: 4 other engineers plus you. Pick a card, reveal, discuss, re-vote.

Planning Poker vs. Wideband Delphi: what's the real difference?

Structurally, they're very similar. The main differences:

  • Scale: Planning Poker uses a predefined Fibonacci scale (relative points); Wideband Delphi can use any unit (hours, weeks, LOC)
  • Scope: Planning Poker estimates one story at a time; Delphi can estimate larger bodies of work
  • Speed: Planning Poker is optimized for speed (30 seconds to 5 minutes per story); Delphi is more deliberate
  • Context: Planning Poker is deeply embedded in Agile/Scrum ceremonies; Delphi is methodology-agnostic

Think of Planning Poker as Wideband Delphi optimized for the Agile context—same principles, different packaging.

Putting It All Together

These methods aren't mutually exclusive. A mature estimation practice might use:

The common thread: use relative sizing instead of absolute, and aggregate multiple perspectives instead of relying on one. These two principles, applied consistently, will improve your estimation accuracy more than any single technique.

Key Takeaways

Lesson 06 Summary

  • Proxy-based estimation works because relative sizing is far easier and more accurate than absolute sizing for humans
  • Fuzzy logic classifies features into size buckets (VS to VL) with factor-of-2 differences; requires 20+ features and historical data for the averages
  • Standard components use known component types (reports, screens, APIs) as estimation units—powerful for repetitive business applications
  • Story points with Fibonacci scale and velocity-based forecasting are the Agile standard; they self-correct over time as velocity stabilizes
  • T-shirt sizing (XS-XL) is fast and rough—ideal for early-stage estimation when precision isn't needed
  • Wideband Delphi is the gold standard for group estimation: independent estimates, simultaneous reveal, outlier discussion, re-vote. Reduces error by ~40%
  • Planning Poker adapts Delphi for Agile. The simultaneous reveal prevents anchoring bias—the single most important mechanism
  • Two universal principles: prefer relative over absolute sizing, and aggregate multiple perspectives over solo estimation

Up Next: Lesson 07

We'll explore data-driven estimation models—COCOMO II, function points, and regression-based approaches that use historical project databases.

Continue to Lesson 07 →