(TFL) is a statutory physique accountable for London’s public transport community, managing buses, the Underground, Docklands Gentle Railway, Overground, and main roads. Their ‘Open Information’ coverage implies that they share a lot of their inside knowledge with the general public, which they are saying is presently powering over 600 apps for Londoners.
One fascinating knowledge supply they share with the general public is Santander Cycle (additionally identified colloquially as Boris Bikes) utilization knowledge. Each bike journey is recorded. This knowledge goes again from 2015 all the best way as much as 2025. The info is organized in unwieldy weekly CSV recordsdata to obtain: https://biking.knowledge.tfl.gov.uk/#!usage-statspercent2F. Every row of this knowledge is one bike journey, with every bike journey ranging from a specific bike station. This equals 9.2 million station-hours, 800 bike stations, 144 weekly CSVs. See an instance of the information beneath.
| Begin Date | StartStation Title | Finish Date | EndStation Title | Length |
|:-----------------|:---------------------------------|:-----------------|:------------------------------------|-----------:|
| 10/01/2016 00:00 | Drury Lane, Covent Backyard | 10/01/2016 00:04 | Frith Road, Soho | 240 |
| 10/01/2016 00:00 | Pott Road, Bethnal Inexperienced | 10/01/2016 00:05 | Victoria Park Highway, Hackney Central | 300 |
| 10/01/2016 00:00 | Harrington Sq. 2, Camden City | 10/01/2016 00:20 | Baylis Highway, Waterloo | 1200 |
| 10/01/2016 00:01 | Canton Road, Poplar | 10/01/2016 00:14 | Hewison Road, Outdated Ford | 780 |
| 10/01/2016 00:01 | Cephas Road, Bethnal Inexperienced | 10/01/2016 00:11 | Brick Lane Market, Shoreditch | 600 |
We are able to take every row and combination this knowledge up so we are able to see the seasonality tendencies throughout a couple of years:

This dataset now offers us a glimpse into the bike utilization throughout London (this knowledge doesn’t comprise each bike journey in London, however we are able to count on that Boris Bike utilization is said to total bike utilization). For a Causal Information Science fanatic, the pure subsequent query is: how can we use this dataset to reply some fascinating causal questions? What occasions happen which have a big influence on cycle journeys? What are some frequent giant scale disruptions that trigger individuals to not be capable to take the tube? How do employees present the worth of their labour to their employers by withholding it? Strikes!
On this article I will likely be analyzing the causal influence of main tube strikes on cycle utilization in London. Historic strikes are considerably onerous to pin down throughout the web, however fortunately for me there’s a FOI into strike motion, which supplies us dates of strike motion at a line stage, between 2014-18.
As the information begins off as one row for each bike journey throughout all bike stations throughout London, we now have some work to do to get right into a format we are able to use. We’ve got 144 weekly CSVs that we convert to parquet’s to assist with reminiscence constraints. We then mix all these parquet recordsdata into collectively one massive dataframe and group by bike station and hour.
| station_id | trips_start | ts |
|-------------:|:--------------------|-----:|
| 1 | 2016-01-10 09:00:00 | 4 |
| 1 | 2016-01-10 10:00:00 | 1 |
| 1 | 2016-01-10 11:00:00 | 2 |
| 1 | 2016-01-10 12:00:00 | 2 |
| 1 | 2016-01-10 13:00:00 | 2 |
TFL additionally present coordinates for every bike station. We be a part of on the coordinates to their corresponding H3 cell. H3 is a hexagonal grid system that’s utilized by Uber and is beneficial for a lot of spatial evaluation duties. The plot beneath exhibits how bike journeys are distributed throughout London.

We are able to now combination the journey knowledge as much as H3 cell-day stage together with some confounders that we expect additionally have an effect on biking utilization in London. These embody climate and seasonality options.
# Course of in chunks to keep away from reminiscence spike
chunk_size = 100_000
h3_cells = []
for i in vary(0, len(bf), chunk_size):
chunk = bf.iloc[i:i+chunk_size]
h3_cells.lengthen([h3.latlng_to_cell(lat, lon, 8) for lat, lon in zip(chunk["lat"], chunk["lon"])])
print(f" Processed {min(i+chunk_size, len(bf)):,} / {len(bf):,}")
bf["h3_cell"] = h3_cells
# Mixture to cell-day
bf["day"] = pd.to_datetime(bf["trips_start"]).dt.date
cell_day = (
bf.groupby(["h3_cell", "day"])
.agg(
total_trips = ("ts", "sum"),
frac_exposed = ("strike_exposed", "imply"),
n_stations = ("station_id", "nunique"),
temperature_2m = ("temperature_2m", "imply"),
precipitation = ("precipitation", "imply"),
is_weekend = ("is_weekend", "first"),
is_bank_holiday = ("is_bank_holiday", "first"),
is_school_holiday = ("is_school_holiday", "first"),
days_to_next_strike = ("days_to_next_strike", "first"),
days_since_last_strike= ("days_since_last_strike", "first"),
month = ("month", "first"),
yr = ("yr", "first"),
doy = ("doy", "first"),
lat = ("lat", "imply"),
lon = ("lon", "imply"),
)
.reset_index()
)
Which means that each row of our dataset now accommodates all Santander bike journeys for every day and every H3 cell. We’ve got 172 cells noticed throughout 1,192 days.
We additionally filtered so that every cells that had at the least one tube cease inside 500m had been included – that is neccessary to fulfill the Positivity Assumption. This assumption states that each unit has to have a non zero chance of each therapy and management. If a cell has no tube stops inside 500m (we are able to moderately assume {that a} commuter who can’t use the tube due to strikes would stroll 500m to make use of a Santander bike).
cell_day = cell_day[cell_day["n_tube_within_500m"] >= 1].copy()
This offers us a cell-day dataset with 62 H3 cells, 66,039 rows and 98.4% of cells ever handled.
Subsequent we are able to outline our final result and therapy variables. As every cell can have differing ranges of anticipated bike utilization, we create our final result variable to be relative to every cell’s capability – the whole journeys for every cell on every day divided by the variety of bike stations in that cell. we take the log in order that our coefficient tells us about proportional adjustments quite than absolute ones and in order that the statistical assumptions of the regression are glad, and we add one in order that quiet cell-days with zero recorded journeys are included within the evaluation quite than silently dropped.
[
Y_{i,t} = logleft(1 + frac{text{Total Bike Trips in cell } i text{ on day } t}{text{Number of Bike Stations in cell } i}right)
]
We are able to calculate the end result variable in python with the next code.
cell_day["y_per_station_log1p"] = np.log1p(cell_day["total_trips"] / cell_day["n_stations"])
Defining the therapy variable for strike publicity isn’t as easy. We all know which tube strains had been putting on every day – however this info doesn’t neatly map to every cell, as every tube line snakes throughout London. Once we are pondering the query of what occurs to bike utilization when tube strains should not operational, it’s useful to first determine when bike stations are “close to” to tube stations which can be being effected by strikes. We’ve got outlined a motorcycle station to be affected by a strike whether it is inside 400m of a tube station that serves one of many putting strains.
We then outline a h3 cell to be strike affected if any bike station is strike affected inside that h3 cell. That is now our therapy variable.
[
T_{i,t} =
begin{cases}
1, & text{if cell } i text{ is strike-exposed on day } t
0, & text{otherwise}
end{cases}
]
To assemble this therapy variable for our dataset, we first should create a strike effected column for our station stage knowledge. We do that utilizing the next perform which takes in our station-hour knowledge, a dataframe which tells us which strains had been putting on every day and a dataframe which tells us stations are related to every putting line.
def attach_strikes_to_base(
base: pd.DataFrame,
strikes_daily: pd.DataFrame,
station_line_map: pd.DataFrame,
) -> pd.DataFrame:
"""
Connect a binary strike_exposed indicator to the station-hour panel.
A station-hour is handled (strike_exposed = 1) if any Underground line
serving that station is on strike on that day.
base should have columns: station_id, trips_start (datetime), ts (numeric journey rely).
"""
df = base.copy()
df["date"] = pd.to_datetime(df["trips_start"]).dt.flooring("D")
station_day_treat = (
strikes_daily
.merge(station_line_map[["station_id", "affected_line"]], on="affected_line", how="inside")
.drop_duplicates(subset=["station_id", "date"])
.assign(strike_exposed=1)
[["station_id", "date", "strike_exposed"]]
)
df = df.merge(station_day_treat, on=["station_id", "date"], how="left")
df["strike_exposed"] = df["strike_exposed"].fillna(0).astype(int)
return df.drop(columns=["date"])
Once we combination the station-hour dataframe to cell-day stage we take the imply of strike_exposed column into a brand new column frac_exposed, and any cells with a constructive frac_exposed grow to be handled cells.
cell_day["treated"] = (cell_day["frac_exposed"] > 0).astype(int)
Extra element on the information wrangling could be discovered on https://github.com/stucsk99/tfl_bike_casual/blob/essential/01_data_pipeline.ipynb
Now we’ve outlined our final result and therapy variables, let’s take a step again and speak in regards to the underlying causal concept that underpins all the outcomes that we’ll arrive at on this article.
What’s the query we need to ask?
The causal mechanism underlying our evaluation is substitution. When a tube line strikes, commuters who would usually journey underground are displaced and should discover another. We argue that for commuters close to main interchange stations, Santander Bikes characterize probably the most accessible different: they’re accessible with out pre-registration, priced for brief journeys, and bodily current on the stations the place displaced commuters emerge. This substitution story is what connects our therapy variable, to our final result by a reputable causal pathway quite than mere correlation.
Strike happens → tube commuters can not journey → these commuters search for options → some stroll to a close-by Santander dock → bike journeys improve. Every arrow in that chain is a step within the mechanism. With out it, even a statistically important result’s only a correlation with a narrative connected. With it, you could have a purpose to imagine the impact is actual.
The causal mechanism we’re describing could be described by the next structural causal mannequin.

As a result of strike timing is set by labour negotiations quite than by something associated to biking demand, we now have good purpose to imagine that strike days should not systematically totally different from non-strike days in ways in which would independently have an effect on bike utilization. A strike referred to as on a Tuesday in January isn’t referred to as as a result of January Tuesdays are unusually good or dangerous for biking – it’s referred to as as a result of a wage negotiation broke down. This makes the counterfactual comparability credible: the bike utilization we observe on comparable non-strike days is an affordable approximation of what would have occurred on strike days had the strike not occurred.
Now that we now have our causal mechanism acknowledged, we are able to stick with it with our causal evaluation. However earlier than we try this, let’s undergo a few of the necessary constructing blocks of causal inference – the potential outcomes framework.
Potential Outcomes
The elemental drawback of causal inference is that we don’t observe the counterfactual outcomes – we by no means know what would have occurred to bike utilization on a strike day, if that strike had not occurred. That is by definition unobservable.
In a super world, we’d observe each potential outcomes for every unit: which is the potential final result if cell had not skilled a strike on day , and which is the potential final result if it did expertise a stike. From right here we are able to outline the person therapy impact for cell on day which is the distinction between the 2 potential outcomes:
[
tau_{i,t} = Y_{i,t}(1) – Y_{i,t}(0)
]
We might like to know this amount for every remark, however as talked about above, we solely ever observe one of many two potential outcomes. The logical subsequent step is to common this impact for over all items. That is the Common Remedy Impact (ATE):
[
ATE = E[Y_{i,t}(1) – Y_{i,t}(0)] = E[tau_{i,t}]
]
That is the anticipated therapy impact for a randomly chosen unit from the complete. In our setting, it solutions: for a randomly chosen cell-day in our panel, what’s the anticipated change in log bike journeys per station if that cell-day had been to grow to be strike-exposed?
We are able to additionally outline one other therapy impact: The Common Remedy Impact on the Handled (ATT):
[
ATT = E[Y_{i,t}(1) – Y_{i,t}(0) | D_i = 1] = E[tau_{i,t} | D_i = 1]
]
The place is the therapy indicator. This shifts focus onto items that had been truly handled. for a cell-day that was truly strike-exposed, what was the causal impact of that publicity?
Naive Remedy Impact
Earlier than we get into how we estimate these figures utilizing strong causal strategies, we are able to first illustrate what goes unsuitable after we estimate the ATE naively. To do that as merely as attainable, we may estimate the ATE to be distinction in pattern means between the handled and management observations. That’s,
[
tau^{naive} = overline{Y}_{D=1} – overline{Y}_{D=0}
]
print(f"Naive diff : {np.expm1(cell_day.loc[cell_day['treated']==1,'y_per_station_log1p'].imply() - cell_day.loc[cell_day['treated']==0,'y_per_station_log1p'].imply())*100:+.1f}%")
In our knowledge, this offers a naive distinction of +5.5%. Cells with any strike publicity have considerably greater log bike journeys per station than cells with out. However this isn’t a reputable causal estimate. We are able to decompose the naive distinction algebraically to see precisely what it’s estimating:
[
overline{Y}_{D=1} – overline{Y}_{D=0} = underbrace{E[Y_{i,t}(1) – Y_{i,t}(0) | D_i = 1]}_{ATT} + underbrace{E[Y_{i,t}(0) | D_i = 1] – E[Y_{i,t}(0) | D_i = 0] }_{textual content{choice bias}}
]
The primary time period is the ATT, what we wish. The second time period is choice bias – the distinction in management potential outcomes between handled and untreated items. In our case, this bias is probably going constructive: cells which can be strike-exposed are close to tube strains, which suggests they’re in denser, extra central areas of London which have greater baseline bike utilization no matter any strike. The naive estimate conflates the impact of strikes with the pre-existing benefit of centrally positioned cells.
Eliminating this choice bias is the whole job of the strategies that observe.
Panel Information
Our dataset has a construction that’s notably well-suited to addressing choice bias. It’s a panel. A panel dataset observes the identical items repeatedly over time. Our particular panel has the next construction
[
{ X_{i,t}, D_{i,t}, Y_{i,t} }
]
The place represents our H3 cells and and represents our days noticed over our dataset. (get precise worth of T and N right here) We’ve got N x T of complete observsations.
The important thing perception that panel knowledge gives is that this: if we observe the identical cell on a number of days, we are able to separate the time-invariant element of that cell’s final result from the day-specific variation. A cell close to Financial institution station is all the time going to be busier than a cell close to Pimlico – that could be a everlasting characteristic of the cell’s location, not one thing that adjustments with strikes. Panel strategies allow us to account for this everlasting characteristic with out ever having to measure it straight.
We are able to use the inherent arrange of the panel knowledge to mannequin the therapy impact utilizing a two manner mounted results mannequin. It is a generalisation of a conventional Distinction in Variations technique. This mannequin is ready up within the following manner:
[
Y_{i,t} = alpha_{i} + lambda_{t} + tau{D}_{i,t} + beta X_{i,t} + epsilon_{i,t}
]
The place is our final result variable for cell on day , is the mounted impact for cell , is the mounted impact for day , is the causal therapy impact, is the therapy indicator, are the coefficents for covariates and are our errors.
On this mannequin, we now have two mounted results, and for every cell and every day , which act as dummy variables for every cell and day. The cell mounted impact accommodates all time invariant cell traits (all of the geographical options of cell that don’t change over time) and the date mounted impact accommodates all cell invariant variation (day particular variation). That is equal to demeaning inside every cell and inside every date, which removes all time invariant cell traits and customary day-level shocks.
We are able to merely run this regression evaluation utilizing the ols perform from the statsmodels.system.api library:
twfe = smf.ols(
"""y_per_station_log1p ~ handled
+ temperature_2m + precipitation
+ is_weekend + is_bank_holiday + is_school_holiday
+ days_to_next_strike + days_since_last_strike
+ C(h3_cell) + C(date_str)""",
knowledge=cell_day,
).match(
cov_type="cluster",
cov_kwds={"teams": cell_day["h3_cell"]},
)
Be aware how we are able to’t run unusual OLS because the observations from the cell throughout totally different days are correlated. If we ignored this correlation and used customary OLS customary errors, we’d systematically understate the uncertainty in , producing confidence intervals which can be too slim and p-values which can be too small. We are able to deal with this through the use of the usual answer of clustering errors on the cell stage. This permits for arbitrary correlation between the residuals and for a similar cell i throughout any two dates and , whereas sustaining the belief of independence throughout cells.
Outcomes
Our TWFE technique offers us a rise of three.95% in Santander bike utilization on strike days, with a p-value of 0.097.
Earlier than we dive deeper into these outcomes, we first deal with some adjustments we made to our knowledge to tighten the causal mechanisms that we need to perceive.
Having established that each cell in our evaluation should have at the least one tube station inside 500 metres – our positivity situation – we apply a stronger restriction motivated by the causal mechanism itself. Not all tube stations generate equal commuter displacement after they strike. The 42 stations we deal with are the key interchange stations of central London: Financial institution, Liverpool Road, King’s Cross, Waterloo, Victoria, and their neighbours. These are the stations the place hundreds of commuters converge every morning, the place Santander Bike docks are densest, and the place the substitution from tube to bike is most frictionless – a displaced commuter walks out of a closed station and finds a rack of bikes inside metres.
At extra peripheral stations, even the place a Santander dock exists close by, the displacement mechanism is weaker. Fewer commuters are purely tube-dependent, and the strolling distance to a motorcycle dock is extra more likely to exceed what a time-pressured commuter will tolerate. Limiting to the 32 cells inside 800 metres of those 42 main interchange stations is due to this fact a deliberate deal with the geographic inhabitants the place each the demand shock from the strike and the availability response from the bike community are sufficiently concentrated for the substitution impact to be detectable.
# Get centroids of all distinctive cells in cell_day_clean
unique_cells = cell_day["h3_cell"].distinctive()
cell_centroids = pd.DataFrame([
{"h3_cell": c,
"lat": h3.cell_to_latlng(c)[0],
"lon": h3.cell_to_latlng(c)[1]}
for c in unique_cells
])
# Construct KD-tree over the 42 station coordinates
station_coords = np.radians(CENTRAL_42[["lat", "lon"]].values)
tree = cKDTree(station_coords)
# Question every cell centroid
cell_coords = np.radians(cell_centroids[["lat", "lon"]].values)
radius_rad = 0.8 / 6371.0 # 800m in radians
# For every cell, discover distance to nearest of the 42 stations
nearest_dist_rad, _ = tree.question(cell_coords, ok=1)
cell_centroids["dist_to_central_42_km"] = nearest_dist_rad * 6371.0
cell_centroids["near_central_42"] = nearest_dist_rad <= radius_rad
central_cells = set(
cell_centroids.loc[cell_centroids["near_central_42"], "h3_cell"]
)
# ── Filter ─────────────────────────────────────────────────────
cell_day_central = cell_day_clean[
cell_day["h3_cell"].isin(central_cells)
].copy()
Days 300 days away from any strike have very totally different seasonal traits from strike days, and don’t have any causal relevance to the comparability. Together with them forces the date mounted results to span a large seasonal vary, and the cell mounted results are estimated from a interval that’s not straight related to the comparability. By proscribing to a neighborhood window of 45 days round every strike date we are able to create a cleaner experiment: the management days look extra just like the counterfactual for the handled days, and seasonal confounding is diminished.
sub = cell_day_central[cell_day_central["days_to_nearest"] <= 45].copy()
We now have 4 totally different variations of basefile, every with an more and more highly effective sign to noise ratio.
| Basefile Model | Rows | Remedy % |
|------------------------------------------:|:-----------|-------------:|
| Solely cells inside 500m of tube cease | 66,039 | 0.82 |
| Solely cells near Central Stations | 34,590 | 0.94 |
| Solely days inside 45 days of strike days | 16,799 | 1.95 |
The plot exhibits the totally different TWFE estimates throughout the totally different basefile specs. With probably the most causally highly effective arrange of our panel knowledge achieves an estimated therapy impact of three.95% with a p-value of 0.097.

Our p-value above the usual p=0.05 that’s used as customary. Which means that our outcomes of a 3.95% improve could be achieved randomly 9.7% of the time. Though our p-value is beneath the standardly used benchmark, we are able to see that our three estimates are constantly constructive, and the width of the arrogance interval displays the restricted variety of strike occasions within the FOI knowledge, not the absence of an impact.
Causal Inference Assumptions
Earlier than getting too carried away with these outcomes, we now have to cease and contemplate the assumptions that should be made for the TWFE estimate to have an informal interpretation.
Positivity/Overlap requires that each unit has to have a non zero probability of being handled. We’ve got addressed this by ensuring each cell within the panel should have at the least one tube cease inside 500m.
Parallel tendencies requires that within the absence of strikes, handled and management cells would have skilled the identical time development in bike utilization. That is believable in our setting as a result of strike timing is set by labour negotiation dynamics — the choice to strike on a specific date is pushed by bargaining outcomes between TfL administration and unions, not by something associated to the underlying trajectory of motorcycle utilization.
No anticipation requires that cells don’t change their behaviour earlier than therapy happens — that the announcement of a strike doesn’t itself alter bike utilization within the days earlier than the strike. That is partially addressed by the inclusion of days_to_next_strike as a covariate within the managed specification, which captures any systematic pre-strike development. We word that for really unannounced strikes, the no-anticipation assumption is routinely glad.
SUTVA (Secure Unit Remedy Worth Assumption, Rubin 1980) requires that the potential outcomes of 1 cell don’t rely on the therapy standing of different cells. That is the belief more than likely to be violated in our setting: a strike displaces commuters throughout a large geographic space, probably affecting bike utilization at cells past these straight adjoining to putting strains. SUTVA violations will attenuate our estimate towards zero, which means our +3.95% needs to be interpreted as a decrease sure on the true impact for probably the most straight uncovered cells.
Closing Remarks
This text got down to reply a easy query: do London tube strikes push commuters onto Santander Bikes? The reply, primarily based on a two-way mounted results evaluation of 4 years of TfL open knowledge, is sure, however arriving at that reply was significantly much less easy than the clear consequence may recommend.
Working with actual life knowledge isn’t easy. To get the journey knowledge right into a format which was usable for me to reply the questions Whereas parsing 144 weekly CSVs, I needed to reconcile inconsistent column schemas throughout knowledge releases, right a silent naming mismatch between strike line identifiers, and rebuilt the spatial mapping between bike stations and tube stops a number of instances.
This was all earlier than contemplating the distinction causal assumptions essential to construct a reputable argument. Coming from an ML background, I additionally spent a non-trivial period of time investigating meta-learners (S, T, and X learners, that are a set of predictive machine studying strategies to estimate therapy results) for this drawback. This might have given us richer perception – the conditional common therapy have an effect on, or CATE, which might inform us how the therapy impact varies throughout London.
I realized the onerous the best way that the software didn’t match the issue. Panel knowledge with recurring binary therapy and a powerful geographic identification story needs a hard and fast results regression, not a cross-sectional ML estimator.

