Your Churn Threshold Is a Pricing Resolution

says “this buyer’s chance of leaving is 0.4” and your code does predict(X) >= 0.5, you have got simply made a pricing determination: you determined that the price of sending a retention supply to a buyer who would have stayed is precisely equal to the price of shedding one who would have left, and on the IBM Telco dataset (arguably the most-recycled churn dataset on Kaggle and GitHub), that call is mistaken by an element of 13.

I assembled a corpus of 36 publicly accessible IBM Telco churn analyses (Kaggle notebooks, GitHub repositories, weblog posts, peer-reviewed papers), and the reporting sample is hanging: roughly 9 in ten report classification accuracy or F1, simply over one in seven report a revenue curve, and none use survival evaluation to compute lifetime worth.

The result’s a literature the place the identical dataset has been re-modelled a whole bunch of occasions, and each default-threshold mannequin leaves cash on the ground: about $86 per buyer in avoidable burn on the usual 20% take a look at break up, and scaled to a 100,000-subscriber e book with the identical churn profile that will symbolize $8.6 million in recoverable value; the IBM Telco churn price (26.5% annual) is unusually excessive, and a more healthy B2C SaaS e book with 5–8% annual churn would see the per-customer determine drop by roughly 3–4×, so what stays invariant throughout any cost-sensitive setting isn’t the headline greenback quantity however the asymmetry — 13× dearer to overlook a churner than to over-treat a loyalist.

Diagram showing how default churn-threshold losses scale into an estimated $8.6M avoidable burn for a 100k subscriber book. — Picture by creator.

This piece lays out three issues, so as: first, what the IBM Telco literature studies and what it leaves out; second, learn how to compute the greenback value of a misclassification utilizing public 2026 B2C SaaS benchmarks and Kaplan-Meier survival evaluation, with no hand-waved CAC; third, why the textbook Bayes-optimal threshold formulation loses to a brute-force sweep when the mannequin is educated on SMOTE-balanced information, and what to do about it.

Each quantity on this article is reproducible from the scripts linked on the finish.

1. The 36-article hole

The IBM Telco Buyer Churn dataset is small (7,032 cleaned rows), tidy, labelled, and has been the canonical introductory churn dataset on Kaggle for practically a decade.

To get a really feel for what the general public corpus really measures, I listed 36 analyses throughout Kaggle, GitHub, and the foremost data-science blogs, scoring every one on ten reporting dimensions starting from F1 rating to a CAC-and-LTV-grounded revenue curve.

The sample that emerged is in Determine 2.

Coverage of ten reporting dimensions across 36 publicly available IBM Telco churn analyses. The high bars are saturated, and the low bars are where the field has work to do. — Picture by creator.

Three findings price pulling out:

Saturated: F1, accuracy, AUC, confusion-matrix screenshots, and SMOTE-vs-no-SMOTE comparisons seem in 80–90% of the corpus, with hyperparameter tuning through Optuna or grid search a near-universal trope.
Unusual: a revenue curve (whole greenback value of misclassifications as a perform of determination threshold) seems in fewer than 15% of the analyses I reviewed, and when it does seem, the FN/FP value numbers are normally picked from a textbook instance with out anchoring them to actual CAC or LTV.
Absent: not one of the 36 analyses I listed compute buyer lifetime worth through survival evaluation on tenure; most both skip LTV completely or use the steady-state Skok formulation LTV = ARPU / monthly_churn_rate, which assumes a homogeneous buyer base — a powerful declare for a dataset the place contract sort, fee technique, and tenure all materially shift retention.

Skipping survival evaluation issues as a result of the edge determination is a perform of LTV: when you misjudge LTV by 2x you misjudge the price of a missed churner by 2x, and the cost-optimal threshold strikes with it.

The following two sections construct the lacking piece, then plug it again into the edge downside.

2. The price of an error, in {dollars}

Three numbers decide the greenback value of each prediction (ARPU, gross margin, and CAC); two come straight out of the dataset, one from public 2026 business benchmarks.

import pandas as pd

df = pd.read_csv("telco.csv")
df["TotalCharges"] = pd.to_numeric(df["TotalCharges"], errors="coerce")
df = df.dropna().reset_index(drop=True)

arpu = df["MonthlyCharges"].imply()                  # $64.80
mean_tenure = df["tenure"].imply()                   # 32.42 months
churn_rate = (df["Churn"] == "Sure").imply()          # 26.58 %
realised_ltv_churned = df.loc[df["Churn"] == "Sure",
                              "TotalCharges"].imply() # $1,531.80

ARPU and tenure are dataset-native: imply month-to-month cost is $64.80, imply noticed tenure is 32.4 months, and the realised LTV of churners is $1,531.80 (simply the typical TotalCharges over clients who already left), so holding ARPU mounted, three frequent LTV framings give very totally different ceilings:

Table showing three lifetime-value framings using the dataset’s mean ARPU. The Simple framing, ARPU times mean tenure, gives $2,100.87. The Skok steady-state framing, ARPU divided by monthly churn rate, gives $7,904.41. The Realized framing, mean total charges for the already-churned cohort, gives $1,531.80.

None of those three is the suitable reply, and the primary one is actively mistaken. ARPU × mean_tenure is the framing most tutorials attain for, and it’s damaged on the root. ARPU isn’t a property of a buyer; it’s the joint consequence of each characteristic that additionally drives churn — contract sort, fee technique, product bundle, family construction. The income leaves with the client, so ARPU and tenure are usually not unbiased portions, and the textbook decomposition LTV ≈ E[ARPU] × E[lifetime] is just legitimate when Cov(ARPU, lifetime) = 0. In any churn dataset price modelling that covariance is non-zero — if it had been zero, the mannequin would don’t have anything to foretell — and a buyer with $80/month Month-to-month Expenses and eight months of tenure isn’t a “high-revenue buyer”; their $80 and their 8 months are two readings of the identical underlying danger profile.

Layer on the truth that ARPU varies inside a single buyer’s lifetime — a promo-onboarding cohort paying $30/month for the primary three months and $80/month thereafter, a long-tenured loyalty buyer grandfathered right into a $50/month price whereas new clients pay $90/month for a similar product — and multiplying the imply of 1 distribution by the imply of one other describes a buyer who exists nowhere within the information.

The Skok formulation at the very least makes use of a steady-state churn price, but it surely assumes that price is fixed without end. The realised LTV is an actual quantity, but it surely solely describes the purchasers you have got already misplaced.

Not one of the three tells you when a new buyer breaks even on acquisition value — for that you just want an actual retention curve, which is the subsequent part, however first a phrase on CAC.

For B2C SaaS within the 2026 benchmarks, CAC ranges from about $68 (eCommerce) to over $200 (fintech), with mid-market subscription merchandise clustering round $150 ([1], [2]); telecom subscriber acquisition value is materially larger ($300+ as soon as handset subsidies are amortised), so $150 is a conservative anchor for this dataset, and selecting the next quantity would solely make the burn calculation in Part 4 bigger.

Where the $150 anchor sits on the 2026 B2C subscription CAC spectrum. Telecom (the actual industry of the dataset) clusters at $300–$500, so a $150 CAC is the conservative floor. — Picture by creator. Knowledge: Genesys Development Advertising (2026); Confirmed SaaS (2026).

Gross margin for B2C SaaS sits within the 70–85% band, with 75% as the same old midpoint that matches David Skok’s modelling assumptions for steady-state SaaS economics ([3]).

That offers us the constructing blocks for the price of a single prediction error.

CAC = 150
ARPU = 64.80
REMAINING_TENURE_MONTHS = 18

FN_COST = CAC + ARPU * REMAINING_TENURE_MONTHS   # $1,316.40
FP_COST = 100              # typical marketing campaign value (midpoint)
ratio = FN_COST / FP_COST                        # 13.2 : 1

A false unfavorable (telling a buyer they may keep once they really go away) prices you the brand new acquisition spend ($150) plus 18 months of foregone income ($64.80 × 18 = $1,166.40), for $1,316.40 whole, whereas a false optimistic (flagging a buyer as a churn danger once they had been going to remain) prices roughly $100 of marketing campaign and low cost expense, leaving a value ratio of 13.2 to 1.

A notice on what the framework is and isn’t. The $150 CAC and the $100 false-positive value on this article are placeholders; CAC varies materially by acquisition channel, and the $100 is shorthand for no matter your actual retention intervention prices — a reduction, a CSM name, a bundle improve, a product investigation. None of those are interchangeable, and a blanket low cost isn’t a retention technique: it’s a deferral mechanism that retains clients solely till the low cost expires whereas paying to retain clients who had been by no means going to depart (and, worse, coaching them to count on the subsequent low cost). Actual retention technique maps churn drivers — a buyer leaving as a result of backup_online retains failing is retained by fixing backup_online, not by a ten% off e-mail — and allocates funds towards product enchancment, with the marketing campaign value as a short-term bridge whereas engineering catches up. The revenue curve here’s a threshold-setting software that operates after you have got determined your retention playbook (what intervention applies to whom, at what actual value); it’s not an alternative to that call. Deal with the $150 and the $100 as a single consultant pair; phase them, and the framework segments with them.

That ratio is the whole cause threshold = 0.5 is the mistaken default: the choice boundary ought to mirror the asymmetry, and we are going to get to the precise formulation, however first comes the LTV piece.

3. The LTV revenue curve

Most churn writeups deal with lifetime worth as a static greenback quantity you multiply by a hazard price.

Survival evaluation does higher.

It measures retention straight from the info and turns LTV right into a curve: the cumulative contribution margin per buyer as a perform of months since acquisition, beginning at −CAC on day zero (you’ve paid to accumulate the client and earned nothing) and climbing as every surviving month provides ARPU × gross_margin × P(nonetheless alive) to the stability.

The Kaplan-Meier estimator does the heavy lifting, with tenure because the length and Churn == "Sure" because the occasion, producing the general curve in Determine 4.

from lifelines import KaplanMeierFitter
import numpy as np

CAC, GROSS_MARGIN, HORIZON = 150, 0.75, 72

kmf = KaplanMeierFitter().match(df["tenure"], (df["Churn"] == "Sure").astype(int))
months = np.arange(0, HORIZON + 1)
S = kmf.survival_function_at_times(months).values

month-to-month = ARPU * GROSS_MARGIN * S
ltv_curve = np.cumsum(month-to-month) - CAC
ltv_curve[0] = -CAC              # day zero, solely CAC is sunk

Figure 4 — Kaplan-Meier-derived LTV profit curve: cumulative contribution margin minus CAC, with -10% and -20% churn-reduction scenarios, and breakeven (cumulative margin crosses zero) sitting at month 3. — Picture by creator.

Three readings price pulling out:

Breakeven at month 3 (in expectation, not per buyer): throughout the unique acquired cohort, the survival-weighted cumulative contribution covers the $150 acquisition value by month 3, and CAC payback below twelve months is the David Skok rule of thumb that Telco beats by an element of 4. That is the suitable quantity for budgeting cohort-level retention spend, however it’s a cohort common that hides bimodal variance: a buyer who churns in month 1 individually contributes one month of margin ($48.60) and by no means recoups their CAC, whereas a 70-month survivor contributes properly over $3,400. The Kaplan-Meier weighting bakes these early losses in appropriately — they simply don’t get a star on the curve.
LTV on the 72-month horizon ≈ $2,527 per buyer: mixed with the $150 CAC, that’s an LTV:CAC ratio of about 17.8:1, properly above the three:1 flooring most SaaS traders search for, and a helpful sanity test that the dataset describes a wholesome unit-economics enterprise quite than a death-spiral one.
Churn-reduction uplift is modest on the cohort degree: a ten% discount in churn lifts terminal LTV by ~2.8% and a 20% discount lifts it by ~5.7%, so the elevate is actual however not heroic, and the acquisition determination issues greater than the retention intervention.

Segmenting the identical calculation by Contract (Determine 5) is the place the framework earns its hold.

Figure 5 — LTV profit curve segmented by contract type: same CAC, same ARPU, with the difference being pure retention. — Picture by creator.

A two-year-contract buyer is price about $3,372 over the 72-month horizon, whereas a month-to-month buyer is price $1,620 (lower than half), with the identical ARPU and the identical CAC, so the whole delta is retention; from a marketing-spend perspective, the right-side acquisition goal (the client you have to be prepared to pay extra to accumulate) is the contract-locked one, despite the fact that they appear “much less worthwhile monthly” in any given snapshot.

That is the sort of determination the usual IBM Telco evaluation can not make, as a result of it by no means computes survival-conditional LTV within the first place.

4. The classification revenue curve

With FN value, FP value, and survival-based LTV in hand, the edge query turns into a one-dimensional optimization: practice a mannequin, get predicted possibilities on the take a look at set, sweep the edge from 0 to 1, compute whole greenback value at every threshold, and decide the minimal.

The mannequin here’s a tuned XGBoost educated with SMOTE on the practice fold solely, the usual Telco recipe.

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
from imblearn.over_sampling import SMOTE
from xgboost import XGBClassifier

y = (df["Churn"] == "Sure").astype(int)
X = pd.get_dummies(df.drop(columns=["customerID", "Churn"]),
                   drop_first=True).astype(float)

X_tr, X_te, y_tr, y_te = train_test_split(
    X, y, test_size=0.20, stratify=y, random_state=42)

scaler = StandardScaler().match(X_tr)
X_tr_s, X_te_s = scaler.remodel(X_tr), scaler.remodel(X_te)
X_tr_b, y_tr_b = SMOTE(random_state=42).fit_resample(X_tr_s, y_tr)

mannequin = XGBClassifier(n_estimators=400, max_depth=5, 
                     learning_rate=0.05,
                     subsample=0.9, colsample_bytree=0.9,
                     random_state=42, eval_metric="logloss")
mannequin.match(X_tr_b, y_tr_b)
probs = mannequin.predict_proba(X_te_s)[:, 1]

thresholds = np.linspace(0.01, 0.99, 99)
totals = []
for t in thresholds:
    pred = (probs >= t).astype(int)
    tn, fp, fn, tp = confusion_matrix(y_te, pred).ravel()
    totals.append(fn * 1316.40 + fp * 100)

The result’s in Determine 6.

Figure 6 — Classification profit curve on the IBM Telco test set: false-negative cost (red) dominates because of the 13:1 ratio, and the default 0.5 threshold sits well to the right of the cost-minimum. — Picture by creator.

The numbers, on a 1,407-row take a look at set:

Table comparing the cost of three classification thresholds on a 1,407-row test set. The 0.50 default threshold costs $199,575, with 139 false negatives and 166 false positives, balanced but the wrong metric. The 0.07 Bayes-optimal threshold costs $87,076, which theory predicts should win. The 0.03 empirical-minimum threshold costs $78,415, with 7 false negatives and 692 false positives, beating the theoretical optimum by $8,661.

Shifting from 0.5 to the empirical minimal recovers $121,160 on the take a look at set, which is $86.11 per buyer, and making use of that to a 100,000-subscriber e book offers the headline $8.6M; your mileage will fluctuate together with your CAC, your ARPU, and your retention curve, however the multiplier (13× dearer to overlook a churner than to over-treat a loyalist) is what makes the hole giant.

When the textbook formulation loses to the edge sweep

Open any cost-sensitive classification reference (Provost and Fawcett’s Knowledge Science for Enterprise is the canonical one [4]) and you will see the Bayes-optimal threshold formulation:

t* = C_FP / (C_FP + C_FN)

Plug in our value ratio: t* = 100 / (100 + 1316.40) ≈ 0.0706, which is appropriate math, and on a mannequin with calibrated possibilities that threshold minimizes anticipated value; however the sweep offers t = 0.03, and at that threshold the test-set value is $8,661 decrease than at 0.07, so the place is the hole?

The Bayes Optimum formulation assumes the mannequin’s predicted possibilities are calibrated: a prediction of 0.5 ought to correspond to a 50% true churn chance, however our mannequin is educated on a SMOTE-balanced set, which inflates the minority class to 50% throughout coaching, and tree-based learners then output possibilities biased towards larger values, with the mannequin’s “0.07” mapping to roughly the true 0.03 in calibrated chance area; the textbook formulation isn’t mistaken, it’s being utilized to an out-of-spec enter.

There are two clear fixes:

Calibrate the possibilities first: apply Platt scaling or isotonic regression on a held-out set, then use the Bayes-optimal threshold on the calibrated output, with scikit-learn’s CalibratedClassifierCV doing it in a single line.
Skip calibration and sweep: it’s low-cost, it tolerates calibration drift, and on a small dataset like Telco the test-set sweep is extra dependable than a calibration mannequin match on a whole bunch of held-out rows.

In follow, for manufacturing programs with common re-training, the sweep is what most groups ship; the formulation is the suitable factor to show (with calibration because the asterisk), and each ought to seem in any trustworthy writeup of a cost-sensitive churn mannequin.

Neither one exhibits up within the IBM Telco corpus I listed.

5. What the subsequent IBM Telco article ought to report

Three concrete shifts would make the subsequent 36 IBM Telco analyses extra helpful than the final 36:

Report a revenue curve, not a confusion matrix. F1 at threshold 0.5 is a event metric (helpful for rating fashions when you need to decide one, ineffective for deciding learn how to ship one), and the curve in Determine 6 has extra decision-relevant info than each accuracy comparability within the corpus mixed.

Anchor LTV in survival evaluation, not steady-state assumptions. Kaplan-Meier on tenure is 30 traces of Python; the breakeven quantity, the LTV at horizon, and the contract-segmented curve give advertising operations a usable funds to spend on retention plus a defensible reply to “which buyer ought to we purchase more durable?”, and the Skok formulation stays a fantastic sanity test quite than the load-bearing LTV estimate.

Disclose the calibration assumption while you quote a Bayes-optimal threshold. Both calibrate first or notice explicitly that the edge reported is the empirical minimal from a sweep, and Wang et al. ([5]) make a carefully associated argument with a extra elaborate metric (e-Earnings) that makes use of survival evaluation end-to-end, with the identical core concept.

Section the intervention, not simply the rating. A revenue curve assumes a single FP value (the worth of “treating” one false alarm), however in follow the most affordable intervention for a high-value loyalist is totally different from the most affordable for an at-risk new account: a bundle improve for one, a price-pain investigation for the opposite, do-nothing for the third; segment-aware FP prices (and segment-aware thresholds) are the pure follow-up to the framework right here.

I went into this anticipating the hole can be within the modeling. It isn’t.

The IBM Telco dataset has been mined to bedrock for predictive accuracy, and what it may possibly nonetheless educate is whether or not our pipelines result in good selections, not simply correct predictions.

That requires three issues: greenback prices on errors, actual retention curves on clients, and an trustworthy threshold on the classifier — 4 scripts and a Kaplan-Meier match get you there.

References

[1] Genesys Development Advertising, Buyer Acquisition Value Benchmarks: 44 Statistics Each Advertising Chief Ought to Know in 2026 (2026), genesysgrowth.com.

[2] Confirmed SaaS, CAC Payback Benchmarks 2026: SaaS Buyer Acquisition Value (2026), proven-saas.com.

[3] D. Skok, SaaS Metrics 2.0: Detailed Definitions (2014, up to date 2024), forentrepreneurs.com.

[4] F. Provost and T. Fawcett, Knowledge Science for Enterprise (2013), O’Reilly Media, ch. 7–8.

[5] Y. Wang, S. Albrecht, et al., e-Earnings: A Enterprise-Aligned Analysis Metric for Revenue-Delicate Buyer Churn Prediction (2025), arXiv:2507.08860.

[6] C. Davidson-Pilon, lifelines: survival evaluation in Python (2019), Journal of Open Supply Software program 4(40), 1317.

[7] W. Verbeke, T. Verdonck and S. Maldonado, Revenue-driven determination bushes for churn prediction (2018), European Journal of Operational Analysis, 284(3).

[8] N. El Attar and M. El-Hajj, A scientific overview of buyer churn prediction approaches in telecommunications (2026), Frontiers in Synthetic Intelligence.

Code, information, and reproducible scripts for each determine can be found on request. The dataset is the IBM Telco Buyer Churn dataset, absolutely artificial pattern information revealed by IBM in its official repository (github.com/IBM/telco-customer-churn-on-icp4d) below the Apache License 2.0, which allows use, spinoff evaluation, and publication with attribution. The information is artificial and accommodates no actual clients or PII.

Thanks for studying! If in case you have any questions or wish to join, be at liberty to succeed in out to me on LinkedIn. 👋

Your Churn Threshold Is a Pricing Resolution

Amazon SageMaker AI Async Inference now helps inline request payloads

Get again hours every single day with autonomous brokers in Amazon Fast

Get again hours every single day with autonomous brokers in Amazon Fast

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

How Cursor Really Indexes Your Codebase

Construct a serverless audio summarization resolution with Amazon Bedrock and Whisper

Context Engineering — A Complete Fingers-On Tutorial with DSPy

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

About Us

Category

Recent Posts