Flagship · NWLB Think Tank Knowledge Hub · Flagship

Productivity Theatre: Why 'Output' Stats Lie and What Replaces Them

Most workforce-productivity software measures activity, not output. The two correlate weakly and sometimes inversely. A 2×2 framework for thinking about productivity metrics, and the operational upgrade the firms actually improving output have made.

Section 01The Metric Almost Everyone Is Measuring Is Not the One That Matters

When a CEO says the firm needs to improve productivity, they almost always mean output per worker. When their workforce-management software measures productivity, it almost always measures activity per worker — keystrokes, hours logged, badge swipes, Slack messages, lines of code committed. The two numbers correlate weakly and sometimes inversely. The systems designed to improve the second often degrade the first.

This is the central problem with the current generation of "productivity" metrics, and it is the reason that the most-instrumented workplaces in the world are not, on the available evidence, the most productive ones. This piece is about the difference between the two — and what the firms actually improving productive output do instead.

Section 02The 2×2 That Every Workforce Metric Belongs On

Activity (what the worker is doing)Outcome (what the worker is producing)
IndividualKeystrokes; hours active; Slack messages; commits; emails sent; calls made; tickets touchedCustomer NPS attributable to this worker; revenue closed; defect rate of the worker's code; resolved tickets weighted by complexity
TeamTotal hours worked; meetings held; documents created; "team velocity" in activity unitsCycle time; customer outcomes; product quality; revenue and retention; engineering DORA metrics; team eNPS

The left column is what almost every workforce-productivity software product measures. The right column is what almost every business case for productivity improvement actually requires. The transformation from left-column measurement to right-column measurement is the productivity-management upgrade that the 2020s have made possible and that most firms have not made.

Three observations about the left column to dwell on.

First, activity metrics are easy to game. Mouse-jigglers, scripted Slack messages, AI-generated email drafts, padded standup updates, and "ghost commits" are all responses to activity-based surveillance, and all of them have observable utilization rates that have risen since 2022. The firms that try hardest to measure activity get the most-gamed activity metrics, and lose the ability to read the underlying signal.

Second, activity metrics correlate with output mostly in jobs where the activity is the output. A customer service agent's call count is a reasonable activity proxy for output when calls are similar in complexity. The same metric is meaningless for a software engineer, where the variance in output across a given activity volume is several orders of magnitude.

Third, activity surveillance produces measurable performance costs. The 2024 MIT Sloan study of keystroke-monitoring deployments found a measurable decline in self-reported intrinsic motivation, a 23% increase in turnover among the most-monitored quartile of workers, and no productivity benefit at the team-output level [1]. The 2025 Microsoft analysis of similar deployments at six enterprise customers reached comparable conclusions.

Section 03What the Firms Actually Improving Output Measure Instead

Three categories of outcome metric are well-established in the engineering and product literature. The principles generalize to other knowledge-work functions.

1. Engineering: DORA + DX + customer outcomes

The DevOps Research and Assessment (DORA) metrics — deployment frequency, lead time for changes, change failure rate, time to restore — are the most carefully validated team-outcome metrics in any knowledge-work discipline. Google's DORA team published the foundational research in Accelerate (Forsgren, Humble & Kim, 2018) and has updated the body of evidence annually since [2]. DORA-strong teams ship more, ship faster, and ship more safely than DORA-weak teams in the same firms, and the cross-firm benchmarks are consistent enough to be actionable.

DORA is necessary but not sufficient. The DX framework (Developer Experience, the GitHub / Microsoft developer-productivity research stream) adds the dimension that DORA misses: how the engineers themselves experience the work. The integrated set — DORA + DX + customer outcomes — is what the best engineering organizations track [3].

2. Sales: weighted pipeline outcomes, not activity counts

The temptation in sales management is to measure calls, meetings, and emails. The temptation has been there for forty years. The data is unambiguous that weighted pipeline outcomes — opportunities reaching defined stages, conversion rates between stages, average deal size, time in stage — are stronger predictors of subsequent revenue than activity metrics are. The activity metrics matter only as leading indicators when conversion rates are stable; in any sales environment where market conditions or product fit are changing, activity is misleading and outcome metrics are what management requires.

3. Customer-facing operations: CSAT and outcome-weighted resolution

Call-center work is the canonical "activity-measurable" job, and the firms that have moved beyond average-handle-time as the dominant metric — to resolution rate, customer-effort scores, and customer-outcome composites — consistently outperform the activity-optimized firms on both customer outcomes and on cost. The activity-optimization equilibrium is a local maximum; the outcome-optimization equilibrium is a higher local maximum, accessible only with a different measurement instrument.

Section 04Why Cal Newport Keeps Getting This Right

Cal Newport's recurring argument — most fully developed in Slow Productivity (2024) — is that the 2010s and 2020s instrumented knowledge-work in a way that optimized for the wrong objective: visible activity rather than completed valuable work [4]. The arguments are not novel; Peter Drucker said much the same in 1967. What is novel is the visibility of the failure mode at organizational scale.

Newport's diagnosis maps cleanly onto the 2×2 above: knowledge workers are managed on left-column metrics in workplaces that are then surprised when right-column outcomes underperform. His prescription — fewer concurrent threads, slower work at a deliberate pace, ruthlessly fewer meetings — is operationally specific and has been adopted by enough firms (notably 37signals, several open-source-foundation teams, and a cluster of design and editorial firms) to produce a small but growing body of practitioner case studies.

Pseudo-productivity uses visible activity as the primary means of approximating actual productive effort. It was sustainable in the industrial era because most work was visible by definition. It is not sustainable in knowledge work, and the data shows it. Cal Newport, Slow Productivity (Portfolio, 2024)

The Newport argument is the worker's-side argument. The Drucker-Forsgren-DX argument is the firm's-side argument. They converge on the same prescription.

Section 05What an Upgrade Actually Looks Like

Concretely, the move from activity measurement to outcome measurement in a typical knowledge-work organization has four steps.

  1. Identify the outcome each team is accountable for. A real outcome, in business terms, with a number that moves over time and a target that means something. "Engineering velocity" is not a real outcome; "median PR cycle time," "deployment frequency," "customer-reported defect rate," and "DORA score" are.
  2. Stop measuring the activity proxies and reporting them. Even if the data is collected for other reasons, do not report it as the team's productivity. The act of reporting it shapes the work toward gaming it.
  3. Train managers on outcome-based 1:1s. The conversation between a manager and an individual contributor about "are you doing enough" is a different conversation when grounded in outcomes vs. activity. The first is about what the work needs; the second is about what the worker is doing. Most managers default to the second because it feels more knowable. Outcome-based 1:1s require manager training; the training pays back within one quarter.
  4. Pay attention to leading indicators inside the outcome metric. Outcome metrics by their nature lag. To manage them you need a small number of leading indicators within the outcome — for engineering teams that is usually a subset of DX metrics; for sales it is usually opportunity-stage progression; for customer ops it is usually first-touch resolution. The leading indicators give the manager something to act on without re-introducing activity surveillance.

Three things to stop:

  • Stop the keystroke-monitoring deployment. The evidence is unambiguous that the instrument is net-negative and that the workforce response degrades the very productivity the instrument was meant to measure.
  • Stop measuring "hours active." Active hours are an activity metric pretending to be an effort metric. They are gameable, they correlate weakly with output, and they signal a lack of trust that has cultural costs.
  • Stop comparing across roles using single-number "productivity scores." The output of a customer-support agent and the output of a senior engineer are not the same kind of thing, and the score that flattens them into a single number flattens away the variance that matters most.

Section 06What Workers Should Do in a Theatre Workplace

The honest situation: many workers in 2026 are in firms whose productivity-management is still anchored in activity surveillance, and changing that from below is hard. Three things that are within the worker's control.

  1. Make your outcomes visible. The fastest way to escape activity-based evaluation is to produce a flow of outcome evidence that your manager can point at when they are asked. A weekly written summary of "what I shipped, what moved, what changed" — three to five sentences — is the single highest-ROI piece of personal infrastructure most knowledge workers can build. It also accumulates into a performance-review document by year-end without additional work.
  2. Negotiate measurement in the role-design conversation, not after. When you take a role, the moment to clarify how performance will be evaluated is at hire, not at the first review. "What does success in this role look like at the six-month mark, in concrete terms?" is a perfectly normal interview question, and the answer tells you which kind of workplace you are joining.
  3. Treat activity-surveillance as a signal about the firm, not just the role. If your firm deploys keystroke-monitoring, mouse-tracking, or always-on screen recording, it is telling you something about how it sees its workers. The signal correlates with other measurable workplace characteristics — high turnover, low trust, weak manager training — that you will live with regardless of whether you personally are flagged by the surveillance system.

Productivity is not how busy you look. It is what you produce — and the measurement instrument that pretends otherwise is corroding the work it claims to measure.

Sources & further reading

  1. [1] MIT Sloan Management Review, 'The Hidden Costs of Worker Surveillance' (2024)
  2. [2] Forsgren, Humble & Kim, Accelerate: The Science of Lean Software and DevOps (IT Revolution, 2018); DORA State of DevOps annual reports
  3. [3] Forsgren, Storey, Maddila, Zimmermann, Houck, Butler (2024), 'The SPACE of Developer Productivity,' ACM Queue
  4. [4] Cal Newport, Slow Productivity: The Lost Art of Accomplishment Without Burnout (Portfolio, 2024)
  5. Microsoft Work Trend Index 2023 ('Will AI Fix Work?')
  6. Peter Drucker, The Effective Executive (Harper & Row, 1967)
  7. Atlassian, State of Teams 2024
  8. GitHub DX Survey (multiple editions)

Frequently asked

What is "productivity theatre"?

The pattern of organizations measuring activity (keystrokes, hours active, badge swipes, Slack messages, lines of code committed) when they intend to measure output. Activity metrics correlate weakly with output and are easily gamed. The most-instrumented workplaces are not the most productive ones, on the available evidence.

What's wrong with keystroke monitoring and similar surveillance tools?

The 2024 MIT Sloan study of keystroke-monitoring deployments found a measurable decline in self-reported intrinsic motivation, a 23% increase in turnover among the most-monitored quartile of workers, and no productivity benefit at the team-output level. The tools are gameable (mouse-jigglers, scripted activity), produce cultural costs, and don't generate the signal they promise.

What should be measured instead?

Outcome metrics, by function. Engineering: DORA (deployment frequency, lead time, change failure rate, time to restore) + DX (developer experience) + customer outcomes. Sales: weighted pipeline outcomes (stage progression, conversion rates) rather than activity counts. Customer ops: CSAT, customer-effort, and outcome-weighted resolution rates rather than average-handle-time.

What's the operational upgrade for a typical knowledge-work organization?

Four steps: (1) identify the outcome each team is accountable for, with a real number that moves; (2) stop measuring and reporting activity proxies; (3) train managers on outcome-based 1:1s; (4) attend to leading indicators inside the outcome metric (not activity surveillance). Cost: weeks of management training, not millions in software.

Share: X / Twitter LinkedIn Email

Get the future of work in your inbox.

Join 200,000+ workers, employers, and partners shaping the AI-powered economy.

Join the Community Support the Mission