Trust the Process, Doubt the Procedure

From Information to Action with Quantitative Decision Science

Decision theory & NBA playoff strategy

Published in

Coεmeta

23 min readJan 1, 2020

This is a parable about simple, straightforward questions of fact, & how they often devolve into complex matters of data processing, analysis & decision-making under fragile epistemic limits, in the real world.
It uses NBA data as a toy framing device to explore more complex data science concepts & techniques. Obviously, this is not remotely how actual NBA strategy or analytics works.
(This is Part 4 of 4, exploring decision theoretic approaches.
Parts 1 , 2 & 3 are summarized in the brief recap below.
Each post stands on its own. )

Run it Back: The Threepeat (Parts 1–3 recap)

Wagers, Probability, Data Wrangling & Frequentist vs Bayesian Inference

In Part 1, I described a bet between myself & my friend / colleague Nat, in which he asserted that away teams in NBA Playoff series are more likely to win the series if they WIN & THEN LOSE the first two away games, than if they split the first two games in the opposite order, despite an identical series score heading into game 3. I demurred, sensing a gambler’s fallacy at play:

I go on to describe my reasoning & the fundamental probability theory underlying it, which I won’t repeat here.

I also describe the semi-complex data collection & processing required to address the question at hand, utilizing 538’s excellent Historical NBA Elo dataset (CC BY license), then surfaced the results & initial summary statistics in an interactive Data Studio dashboard, reproduced below (with default filters set to conditions best suited to the original wager, under modern NBA Playoff rules — post-2002).

As can be seen, between 2003–2015, over 84 playoff series, away teams splitting the first 2 games of the series had essentially the same win %, regardless of the order of the win & loss (39.1% to 39.5%).

I was tempted to declare victory here, but decided to try to salvage the 1984–2002 best of 7 playoff series to increase the sample size. You can see for yourself by adjusting the filters above, but here’s the topline results:

Now things get a bit muddled. At a sample size of 134 total series, the win-then-lose segment has pulled into a 43% to 36% advantage, all of which accrued between 1984–2002 (since we’ve already seen it was tied since 2002):

Between 1984–2002, the first round of NBA Playoffs were best of 5 series, which are excluded here & thus explains some of the lower incidence of series beginning with split results over that period. (The first round is also by nature the round with the most total series in play, & thus results in the most series lost from our sample by excluding it).

But is this advantage real? Or simply expected random fluctuation around otherwise equal winning chances?

This sounds like a question about statistical significance, but, as you might have heard, scientists are rising up against it, as the (mostly Bayesian) statisticians have long advocated.

In Part 2 I explored why, showing how traditional frequentist statistical tests unanimously support my side of the bet, but suffer from chronic issues of hidden assumptions, misunderstanding & mechanical misapplication.

Cool cool. Traditional so-called Null Hypothesis Significance Testing (NHST) is on the outs, but surely the Bayesian alternatives (explored in Part 3) will save us?

…oh…oh no.

Well then.

How to proceed?

Bets must be settled. Decisions must be made. Science must advance.

Enter the hairy field of decision science / theory / analysis.

This post will introduce the basic elements of quantitative decision science, then demonstrate what a very basic application might look like, given our question of interest. We’ll then suggest directions & resources for further investigation.

Intro to Decision {Science, Theory, Analysis}

Action vs Understanding

Statistics & data analysis help us make sense of information, to learn from it. This is often passive learning, for the purpose of advancing scientific understanding or calibrating our own beliefs (…or settling bets). But sometimes we’re interested in more than passive, academic learning: we must make a decision & drive some action. As alluded to above, this is the province of decision science (variously termed decision theory or decision analysis).

Descriptive vs Prescriptive

Decision science / theory can be broadly described as “the study of decision-making”.

… It has both descriptive & prescriptive approaches.

Descriptive decision science is primarily a philosophical & psychological field of social science, studying how decisions are made in practice.

Prescriptive decision science (often called decision analysis), aims to provide a principled (usually quantitative) framework for optimal decision-making given some objectives, constraints & available information. It attempts to capture the universe of possible decisions & outcomes, then evaluates the consequences of each decision-outcome combination. The decision / action with the optimal estimated consequences, given potential outcomes & objectives, is then chosen.

Approaches

Decision science has a long history with diverse applications across many disciplines: from statistics to operations research / management science / decision support to game theory, economics, finance & risk management to machine learning, etc.

Its various approaches are again complicated by divergent (though reconcilable) frequentist & Bayesian methods, as well as alternate approaches inspired by heuristics as well as machine learning.

Alternate approaches also exist for probabilistic & non-probabilistic scenarios, as well as scenarios with & without empirical data.

We’ll largely avoid these complications by adopting as generic a framework as possible, but see the “further reading” section at the bottom of this piece to go deeper.

Procedural framework

As mentioned above, a typical decision analysis proceeds as follows:

Specify possible decisions / actions
Specify potential outcomes
Quantify consequences of possible action / outcome combinations
(sometimes requires estimation from data, as well as specifying loss / cost / utility / objective functions)
Specify & apply an appropriate decision criterion or “rule”, given particular context, risk tolerance, etc

If parts of this sound eerily similar to traditional statistics, optimization & machine learning approaches, there’s a good reason: many of the same techniques apply.
The main distinction here is the explicit evaluation of multiple possible decisions / actions, as well as an explicit choice of a decision rule, which provides a deliberate, logical basis for choosing between different actions & their estimated consequences.

Let’s make this more concrete by applying this framework to our original wager.

Decision Analysis Example: NBA Playoff Strategy

(To reiterate the disclaimer from the beginning of this piece: what follows is, obviously, not remotely how actual NBA strategy or analytics works.)

So far in this series we’ve been discussing our wager & question of interest as a matter of fact or belief. We’ve seen how various applications of frequentist & Bayesian statistical hypothesis testing provided varying, unsatisfying & ultimately inconclusive findings.

Such is often the world of academic pondering. But what if we treated our question as a concrete, real world problem? What if some decision had to be made & a corresponding action taken, with actual consequences to follow? This moves us beyond mere hypothesis testing, into the world of decision science.

NBA Eastern Conference semi-finals: Game 3 approaches

In order to translate our original wager into a matter of decision analysis, let’s put ourselves in the shoes of our heroes’ head coach, Brett Brown, in the scenario that gave rise to our initial wager: having lost then won the first 2 away games of a best-of-7 playoff series.

One assistant coach, we’ll call him Coach Nat, is in our ear lamenting the fact that if we’d only split the first 2 games in the reverse order, we’d be more likely to win the series. Thus we should undertake a shift in strategy to alter these odds.

Another (far handsomer & more modest) assistant coach disagrees, saying we should stay our current course, because our odds should be the same either way. Switching strategies now means we’ll expend needless effort & lose the chance to double down on a strategy that has produced a desirable outcome already: having won 1 out of 2 initial away games, restoring home court advantage to ourselves.

Whose advice do we follow?

Being the empirical-minded bearers of Hinkie’s legacy that we are, we summon our analytics nerd staff to help settle the question. Following the simplified decision analytic framework above, they lay out the following:

Specify possible decisions / actions
— decision_c = change in strategy
— decision_n = no change in strategy
Specify potential outcomes
— outcome_w = win the series
— outcome_l = lose the series
Quantify consequences of possible action / outcome combinations
(sometimes requires estimation from data, as well as specifying loss / cost / utility / objective functions)
— …here’s where art meets science…

Expected value & base rates

Assuming we trust the historical record since 1984, we can establish our base rates of expected success. As we’ve repeatedly noted, away teams that lose then win (L_W) have a 36% historical win rate, while win then lose (W_L) teams have a 43% win rate.

This data allows us to set some initial expected value baselines for success, per basic probability theory.

In our scenario, our outcome is binary (win or lose the series), which we can value as win = 1 & lose = 0. (A quintessential zero-sum game).

Expected value is simply defined as the value of an outcome, weighted by its probability of occurring:

𝔼[outcome_w|L_W] = outcome_w * 𝐏(outcome_w|L_W) = 1 * .36 = .36
𝔼[outcome_l|L_W] = outcome_l * 𝐏(outcome_l|L_W) = 0 * .64 = 0

All we’ve done here is define the expected value (𝔼) as the value of a series win or loss (1 or 0) multiplied by its probability of occurring (aka historical win rate), given that fact that we lost then won the first 2 games (L_W).

Notice that the expected value of winning the series is simply equal to the probability of winning it, since the total value of a series win = 1, and 1 * x = x. Similarly, the value of a series loss = 0, & 0 * x = 0. If we had outcomes with non-binary values (e.g. point differentials or expected wins), things would be different. As it is, we can simplify this all:

𝔼[outcome_w|L_W] = 𝐏(outcome_w|L_W) = .36
𝔼[outcome_l|L_W] = outcome_l = 0

For Coach Nat’s sake, we can also compute the expected value of a W_L team in our situation:

𝔼[outcome_w|W_L] = 𝐏(outcome_w|W_L) = .43
𝔼[outcome_l|W_L] = outcome_l = 0

So far we haven’t discovered much more than we already knew: given historical data, we have an estimated 36% chance of winning the series, while we’d have a 43% chance if we split the first 2 games in the reverse order.

So what do we do?

Actions & Cost functions

To answer that, we have to quantify some assumptions about our possible decisions / actions (change strategy or not), & how those decisions will interact with the base rates above to produce ultimate consequences, given potential outcomes (win or lose the series). We call this sort of quantification a “cost” function (among other things).

Recall our potential actions:
— decision_c = change in strategy
— decision_n = no change in strategy

…And also remember that this is the real world, where actions are not free but require effort, incur some costs, & prevent us from taking other contrary actions.

For instance, if we decide to change our game strategy mid-series, we can’t simply flip a switch & execute a different strategy. Rather, we need to learn the new strategy, practice it, & unlearn our old strategy. These things all have costs.

We can define such costs in conventional economic lingo as:

switching costs: the time & effort (i.e. practice, study) required to change from one strategy to another
opportunity costs: since we’re now spending time & effort on switching, we also incur the cost of what we might have done with this time & effort instead, such as reinforcing an existing strategy that had some success & would require less effort than switching, or using that time to rest & recover, etc

So we might define a simple cost function (𝒞) of a decision as follows:

𝒞(decision, s, o) = decision * (s + o)

where: s = the switching cost & o = the opportunity cost.

This simply specifies that the cost of a given action is the sum of the costs, multiplied (i.e. scaled) by the decision.

Then we must give them values.

So let:

decision_c = 1 (since an action is taken, this will activate the cost function)
decision_n = 0 (no action, negates the cost associated with the action)
s = .05 (the switching cost, to be applied to the expected value)
o = .05 (the opportunity cost, to be applied to the expected value)

This would result in the following costs:

𝒞(decision_c, s, o) = decision_c * (s + o) = 1 * (.05 + .05) = .1𝒞(decision_n, s, o) = decision_n * (s + o) = 0 * (.05 + .05) = 0

But this isn’t quite fair, since the change decision incurs only costs but no benefits. In reality, we (read: Coach Nat) might conjecture that a change of strategy will have the intended benefit of swapping the L_W base rate win % for the W_L win %, at the price of switching & opportunity costs.

So then our cost function would become:

𝒞 = decision_c * (s + o — [𝐏(outcome_w | W_L) — 𝐏(outcome_w | L_W)])

This looks gnarly, but just simply takes the difference in W_L & L_W win probabilities, then subtracts it from the other costs, as an offsetting benefit.

Plugging in our values gives:

𝒞(decision_c, s, o) = 1 * (.05 + .05 — [.43 — .36]) = .1 — .07 = .03

And since we already know that decision_n = 0 & negates the rest of the cost function, we can just compare costs:

𝒞(decision_c, s, o) = .03𝒞(decision_n, s, o) = 0

Thus the total estimated cost of the decision to change strategies is greater than the cost of staying the course.

You can probably see where this is going, but recall that we’re still on step 3 out of 4. So we’ll bring it all home in the final step for completeness:

4. Specify & apply an appropriate decision criterion / rule, given particular context, risk tolerance, etc

As we’ve seen, this sort of probabilistic decision analysis enables the use of expected value estimates for each action taken, which implies an especially simple & straightforward decision rule: choose the option with the greatest expected value.

We haven’t yet incorporated our cost function from step 3 into our expected values, so let’s do that now:

𝔼[L_W | decision_c] = 𝐏(outcome_w | L_W) — 𝒞(decision_c, s, o)

This is the expected value of a L_W team given that they decide to change strategies, which we arrive at by subtracting the cost function from the base rate win probability.

Plugging in values from above gives:

𝔼[L_W | decision_c] = .36 — .03 = .33

Doing the same for our decision to make “no change”:

𝔼[L_W | decision_n] = .36 – 0= .36

Thus, 𝔼[L_W | decision_n] > 𝔼[L_W | decision_c]

...or “no change” gives a greater expected win % than the “change” decision, thus the expected value decision rule compels us to choose “no change”.

Evaluating multiple “states of nature”

The preceding analysis might strike you as little more than a simple cost-benefit analysis (CBA), where probabilistic expected value is substituted for traditional monetary value. You would be correct, in that CBA can be thought of as a special case of decision analysis. But decision theoretic frameworks provide a much richer extension of this sort of evaluation, through the inclusion of probability theory, alternate decision rules, flexible cost functions & consideration of multiple outcomes or states of nature.

Referring to outcomes as states of nature evokes the fact that these outcomes are typically beyond our control, or at least cannot be perfectly predicted. Explicit inclusion of states of nature in our decision framework forces us to consider the range of potential outcomes, & their potential impact on ultimate consequences. This informs our decision-making, even influencing the decision rule we choose, contextualizing it all in terms of the larger consequence-space we’re operating within.

How can we incorporate more states of nature into our analysis, for a richer result?

NBA states of nature

So far, we’ve only considered two outcomes:

— outcome_w = win the series
— outcome_l = lose the series

But we can approach these with greater granularity if we think one level deeper, in terms of what sub-outcomes affect these outcomes. An obvious option here would be the most direct cause of wins or losses: total points scored, or more specifically, the point differential. This is perhaps too granular for our purposes (but I invite readers to undertake this exercise, utilizing any version of the data made available in Part 1).

A more appropriate level of abstraction might be our team’s quality of play, or how well the team executes the strategy. This could be impacted by all sorts of factors, such as individual players’ physical health, mindstate, gym conditions, etc. All of which are not entirely within our control (otherwise we’d never see subpar play in the NBA). We might define 3 levels for our quality of play states of nature:

below average play
average play
above average play

But how to quantify their expected values?

Recall that our previous expected value of a series win for L_W teams simplified down to their historical win rate: 36%. We might naively or arbitrarily assume performance contributes +/- 5% around the average expected value, but we can be a bit more empirical than this by leveraging the estimation from our Bayesian models in Part 3.

Revisiting results from our application of an informative prior to our data via the bayesAB package:

library(bayesAB)# fit model
AB_strongprior <- bayesTest(w_l_wins, l_w_wins, 
                            priors = c('alpha' = 12, 'beta' =17), 
                            distribution = 'bernoulli')# display summary output
summary(AB_strongprior)# produce plots
plot(AB_strongprior)

Output:

Quantiles of posteriors for A and B:$Probability
$Probability$A
       0%       25%       50%       75%      100% 
0.2270689 0.3930053 0.4264228 0.4609739 0.6679833$Probability$B
       0%       25%       50%       75%      100% 
0.1842002 0.3410363 0.3739977 0.4076374 0.6115122--------------------------------------------P(A > B) by (0)%:$Probability
[1] 0.77255--------------------------------------------Credible Interval on (A - B) / B for interval length(s) (0.9) :$Probability
        5%        95% 
-0.1473586  0.5354154--------------------------------------------Posterior Expected Loss for choosing B over A:$Probability
[1] 0.02552246

This model produced full posterior distributions around our sample historical win rates for W_L (“A”) & L_W (“B”) groups (see middle image above). The summary output also helpfully annotates the quartile thresholds of these distributions:

$Probability$A
       0%       25%       50%       75%      100% 
0.2270689 0.3930053 0.4264228 0.4609739 0.6679833$Probability$B
       0%       25%       50%       75%      100% 
0.1842002 0.3410363 0.3739977 0.4076374 0.6115122

With this more empirical information, let’s define our states of nature as follows:

below average play = 1st quartile (𝑸1) aka 25th percentile
average play = 2nd quartile (𝑸2) aka 50th percentile
above average play = 3rd quartile (𝑸3) aka 75th percentile

This allows us to construct a payoff table, comparing the estimated win probability of each decision under different states of nature:

Extrapolating our cost function & plugging in values gives:

Due to the symmetry of our posterior distributions (modeled binomially) & our linear (additive) cost function, we end up with boring, predictable payoffs which don’t vary much from our original single-state-of-nature analysis (equivalent to the average state here). But if we had a more complex cost function with more interactions, or less docile distributions, we could see considerable variance here.

Since all No Change payoffs are higher than their Change counterpart under each state of nature, we aren’t left with a very interesting decision problem, as all imaginable rational decision rules would lead us to again choose the No Change decision. So I’m going to fudge some of these numbers for the sake of more meaningfully demonstrating alternate decision criteria.

Selecting a decision criterion under many states of nature

Let’s pretend these are our payoffs:

This breakdown of win probabilities under different decision + outcome combinations paints a much murkier picture as to the “best” option. On the one hand, No Change contains the highest possible value, but on the other hand, there is less variance & a higher floor on the Change side.

Which is optimal? It depends on our risk tolerance, gambling disposition, etc.

These are encapsulated in so called decision criteria or decision rules.

At first blush, with monikers like minimax, maximin & maximax, the most common decision criteria can appear inscrutable (or indistinguishable). But they’re actually quite lucidly & logically named, once understood. They also help us explicitly consider, select & express a decision-making logic appropriate for our context.

Here’s a quick rundown, which we’ll make more concrete by applying to our data below:

Maximax: Choose the decision which contains the maximum of all maximum payoffs across decision alternatives.
(This is an optimistic strategy, which optimizes for the best case scenario, ensuring maximum possible payoff if all goes well).
Maximin: Choose the decision which contains the maximum of all minimum payoffs across decision alternatives.
(This is a pessimistic, or risk-averse strategy, which optimizes for the worst case scenario, ensuring the best of the worst possible payoffs).
Minimax: Choose the decision which contains the minimum of all maximum payoffs across decision alternatives.
(This doesn’t actually make sense for payoffs, but as the inverse of the maximin criterion, is used as a pessimistic strategy when dealing with losses rather payoffs).
Minimax Regret: Choose the decision which contains the minimum of all maximum regrets across decision alternatives. Regret or “opportunity loss” is defined as the difference between a given payoff & the highest possible payoff for a given state of nature, if a different decision was made.
(This becomes clearer with a concrete example below, but is again a more conservative approach, optimizing for the decision whose maximum possible “regret” across all states of nature is the smallest).

This is a short list of non-deterministic or non-probabilistic decision criteria, because they do not attempt to account for the likelihood of possible states of nature. (And there are more).

Once we bring in probabilities, we get into a richer world of expected value (as seen above), risk & perfect information, which we’ll also explore below.

So let’s see how they work.

Maximax

We’ll start with the optimist’s criterion, the maximax:

Choose the decision which contains the maximum of all maximum payoffs across decision alternatives. (This is an optimistic strategy, which optimizes for the best case scenario, ensuring maximum possible payoff if all goes well).

On our payoff table, this can be demonstrated like this:

The maximum payoff under the Change decision is .37, while the maximum No Change payoff is .41. The maximax is the maximum of these two maximums, or .41, which pertains to the No Change decision. So the maximax criterion would lead us to choose No Change to optimize for the largest possible payoff, or best of the best scenarios.

Maximin

Now to the pessimist’s criterion, the maximin:

Choose the decision which contains the maximum of all minimum payoffs across decision alternatives.
(This is a pessimistic, or risk-averse strategy, which optimizes for the worst case scenario, ensuring the best of the worst possible payoffs).

The minimum Change payoff is .35, while the minimum No Change payoff is .34. The maximin is the maximum of these two minimums, or .35, which pertains to the Change decision. So the maximin criterion would lead us to choose Change to optimize for the largest minimum payoff, or the best of the worst scenarios.

Minimax Regret

As mentioned above, the minimax criterion only applies to losses, & as such is the counterpart to maximin for rewards (seen immediately above). So we move on to a slightly different sort of criterion, the minimax regret:

Choose the decision which contains the minimum of all maximum regrets across decision alternatives. Regret or “opportunity loss” is defined as the difference between a given payoff & the highest possible payoff for a given state of nature, if a different decision was made.
(This is again a conservative approach (at least psychologically), optimizing for the decision whose maximum possible “regret” across all states of nature is the smallest).

There’s a bit more going here, since we have to first compute the regrets, then choose the maximum of the minimums. So one step at a time:

The regret is the difference between a given payoff & the highest possible payoff for a given state of nature, if a different decision was made. This is represented in the blue boxes. For instance, within the Below Avg state of nature (first row), there is no regret associated with the Change decision, because it is the highest possible payoff across decisions. But if we chose No Change within that state of nature, we’d experience a regret of .01, because its payoff is .34, but we could have a payoff of .35 if we chose Change. We perform this calculation for each row.
Next we find the maximum regret within each decision (red boxes). We see that the Change decision has a maximum regret of .04, while No Change has a maximum regret of .01.
We then choose the minimum among these maximum regrets, to limit the maximum regret we can feel, given our decision. That leads us to choose the No Change decision, with a maximum regret of .01 vs .04 for Change.

That’s it for our basic non-probabilistic decision criteria, but what if we brought in probability for finer grain estimation?

Expected Value

We’ve already seen a basic application of the expected value criterion in our initial demo analysis with only one state of nature (technically there were two: “winning” & “losing” the series, but since losing brought a value of 0, it negated itself).

If we want to apply expected value to our current framing with multiple states of nature, the first thing we have to do is assign probabilities to each state. In lieu of a more empirical approach, we might reason: this is the playoffs & our team is healthy, we know they are disciplined & motivated enough to give their full efforts for the rest of the series, thus a below average performance should be less likely than an average or above average showing. So we arrive at these probabilities:

below average: 20%
average: 40%
above average: 40%

We can now compute expected values for each decision, across states of nature, which is simply the mean weighted by probabilities:

So the E.V. of our Change decision is simply:
(.35 * 0.2)+(.36 * 0.4)+(.37 * 0.4) = .36

The expected value criterion requires us to simply choose the decision with the greatest value, which is No Change here (.38 vs .36).

Expected Value of Perfect Information

In all our action-oriented decision-making mania, as good empiricists we should never lose sight of one ever-present (if under-utilized) option: to collect more information for an even better decision.

There are many principled approaches to this decision, but the application of expected value leads naturally to one in particular, encapsulated by the (frankly, *chef’s kiss*) phrase: the expected value of perfect information (EVPI).

We’re already familiar with the expected value part, so what is perfect information?

Somewhat related to the concept of regret, perfect information simply implies the decision we would make for each state of nature, if we knew perfectly that it would occur. In other words, it is simply the decision with the greatest payoff (i.e. no regret) in each state of nature.

We can then take the expected value of those decisions under perfect information, which gives the expected value with perfect information, from which we can determine the expected value of perfect information:

So here the EV w/ PI is .382. This is essentially the payoff we’d expect if we always chose perfectly, regardless which state of nature occurred. We can treat this as the upper bound of expected value, given an uncertain future in which we choose correctly every time.

We can then use this to determine how much additional information is worth to us, by finding the difference between this E.V. under perfect conditions (.382) & our original E.V. without perfect information (.38):

.382 — .38 = .002

So our EVPI is .2%.

Notice what this is saying: we know the best we can do with perfect information is .382, this is our E.V. ceiling, given uncertain states of nature occurring with the specified probabilities. We also know that the expected value criterion already advised us to choose No Change, which carries an expected value of .38. This is our E.V. baseline. So the EVPI is simply the difference between this baseline & the ceiling. In other words: what would we gain from perfect information?

This gives us a rational basis to decide whether collecting more information is worth it. In this case, the absolute best that more information can provide to us (given our conditions) is + .2% in additional win probability. So, in practice, more information will almost always get us less than that, as perfect information rarely exists in the real world.

Decision Criterion recap

Let’s round up our various decision determinations across criteria:

Maximax (optimistic): choose No Change
Maximin (pessimistic): choose Change
Minimax regret (opportunistic/conservative): choose No Change
Expected Value (realistic): choose No Change
Expected Value of Perfect Info: gain at most .2% with more info

(Of course, given our example, we shouldn’t take any of these too seriously, since the info was mostly made up!)

Part 4 Conclusion, Qualifications & Addendums

This was an oversimplified demonstration of a decision analytic approach to what had previously been cast as a more academic matter. My hope was to show how this approach can help frame, concretize & rationalize “simple questions of fact” that devolve into much more complex matters, especially those which require a decision be made, with real consequences to follow.

That said, this post barely scratched the surface, & the following qualifications & possible further directions must be noted:

Clearly, assuming a different cost structure would likely lead to different ‘optimal’ decisions
We might attempt to learn cost functions or state of nature probabilities more empirically, looking to the historical record for clues
We might add more nuanced decision options, e.g. a “partial strategic shift”, which cuts the costs in half but also shrinks the benefit. Or else a continuous “shift” range (between 0–1), with proportionate costs & benefits, etc.
I glossed over & fudged many details of decision theoretic frameworks & probability machinations, given the triviality of our toy example & for the sake of concision & a desperate attempt at clarity. See the further resources section below for deeper accounts.

🎬 🎬 🎬 Series Conclusion 🎬 🎬 🎬

The title of this series is: Trust the Process, Doubt the Procedure. This is both an allusion to our protagonists as well as an overarching theme & mantra for sound empirical work, imo.

The Process is the scientific method, broadly & robustly construed. Procedures are narrow, usually fragile applications or approaches to (a piece of) that process.

Understood this way, we must doubt our procedures, if we are to truly trust the process.

In Part 1, we introduced our wager & question of interest, framed it via basic probability theory, collected & processed our data, & produced simple summary statistics, which we determined were insufficient to settle the matter.
In Part 2, we approached the question through the framework of traditional statistics, via frequentist Null Hypothesis Significance Testing. These tests unanimously supported my side of the bet, but we showed how they suffer from chronic issues of hidden assumptions, misunderstanding & mechanical misapplication.
In Part 3, we explored Bayesian alternatives to frequentist NHST, which provided much richer & more interpretable probabilistic assessments of our question of interest, but still left us without an unequivocal binary determination: am I right or is Nat?
In Part 4 (this post), we showed how translating our question from an idle academic matter to a matter of action & decision helps to clarify, concretize & rationalize our approach. Basic decision theoretic frameworks allowed us to cast our question in terms of costs & consequences, & gave us a rigorous method to assess the relative value of decisions & information.

It has been a journey. And yet, we still haven’t put our question to rest:

So who wins the bet?

Here’s where I make my confession, if it hasn’t been obvious already: I’ve only been using this question as a thin conceit to enable detailed exploration of these analytic concepts & approaches, which I’ve wanted to undertake for a while. So, really, we all win. (*troll face*)

That said, I will just reiterate that, from the outset, our two groups’ win % was actually tied since 2002 (with slight advantage to L_W teams). So, if you really want to argue about NBA dynamics between 1984–2002, I have but one response:

Ok boomer.

Thanks for reading, see below for further resources, follow me & check out my other posts. Also, please comment with thoughts, objections or corrections!

—
Follow on twitter: @dnlmc
LinkedIn: linkedin.com/in/dnlmc
Github: https://github.com/dnlmc

Further Decision Science Resources

Decision Theory (simple definitions of decision criteria & a worked example)
Bayesian Decision Theory Made Ridiculously Simple
Intro to Decision Theory: Bayesian Methods & Modern Statistics (pdf)
Fundamentals of Decision Theory (pdf, includes additional criteria: equal likelihood, Hurwicz, as well as utility functions)
Basic Decision Theory (chapter of online textbook for a class in “Planning Algorithms, lots of math formalism)
More lecture notes: one (.ppt), two (pdf)
Short post on seeking more information (related to EVPI)
Bunch of stuff on Fast & Frugal Trees, which I couldn’t explore here: R package, vignette, shiny app, blog post, journal article 1 & 2