Machine Learning · Urban Mobility · Open Data · 2023

Teaching One City's AI to Predict Another City's Scooters

What if you could forecast e-scooter demand in a city you've never seen before — using only free, publicly available data? Researchers from TU Munich just showed it's possible.

Authors Abouelela, Lyu & Antoniou
Journal Data Science for Transportation, 2023
Best Model LightGBM (+15.9% accuracy)
7M+ trips analyzed in Austin
390K trips in Louisville
4 ML models compared
67% of prediction power from time-series
15.9% error reduction with transfer learning
100% open-source data
The Problem

Scooters Arrived Without a Manual. Here's the Operating Guide.

Shared e-scooters are notoriously hard to run efficiently. Operators deploy a fixed number of vehicles and hope for the best — even though demand swings wildly by season, day of week, weather, and local events. Too many scooters clog sidewalks; too few mean missed revenue and unhappy riders.

The challenge is especially acute for new deployments: a city launching scooters has no historical data of its own. This paper tackles that problem directly by asking: can we train a machine learning model on one city's data and use it to predict demand in another?

The answer, using Austin (source) and Louisville (target), is yes — with the right techniques.

7M+
Austin trips, Apr 2018–Jan 2020
390K
Louisville trips, Aug 2018–Jan 2020
4
ML models tested head-to-head
100%
Open-source data sources used
What They Built
The Framework

From Raw Data to Daily Fleet Predictions

The researchers built a pipeline that takes four types of freely available data — scooter trips, weather, census demographics, and built environment info from OpenStreetMap — and combines them to predict the single most operationally useful number: trips per vehicle per day (fleet utilization).

Predicting utilization rate (rather than raw trip count) is smart design. It controls for fleet size differences between cities, avoids the supply-demand chicken-and-egg problem, and directly informs how many vehicles to deploy on any given day.

Model Transfer Pipeline — Austin → Louisville
Source City
Austin, TX
Long-term data · 21 months · 15,000 max fleet
Train model +
Transfer learning
Target City
Louisville, KY
Short pilot data · 3 months · 1,200 max fleet
Step 01
Feature Engineering
Extract time-series features: recent demand, weekly patterns, trend differences
Step 02
Sample Normalization
Rescale each time window to mean=0, std=1 to bridge the gap between cities
Step 03
Label Differencing
Remove trend from demand data so the model predicts changes, not absolute levels
Step 04
Predict Louisville
Apply the Austin-trained model to Louisville's pilot data and evaluate accuracy

The key technical challenge is called covariate shift: Austin and Louisville have different demand scales, fleet sizes, and rider populations. A model trained on Austin data would naively underestimate Louisville's higher utilization rates. The two-step fix — sample normalization plus label differencing — elegantly aligns the distributions without needing to retrain.

The Models
Head-to-Head

Four Algorithms Walk Into Louisville. One Wins.

Four machine learning models were tested, ranging from classical statistics to cutting-edge deep learning. The results challenge a common assumption: more complexity doesn't always win.

Model
Test RMSE
Test MAE
Verdict
LightGBM
Gradient Boosting · Decision Trees
1845.6
346.8
🏆 Winner
Linear Regression
Classical Statistics · Fast
2054.4
381.2
Runner-up
SVR
Support Vector · Kernel Methods
2208.3
371.3
Inconsistent
LSTM Neural Net
Deep Learning · Sequential
2376.0
436.4
Worst

LSTM — the deep learning champion often used for time-series prediction — came in last. The authors explain: for tabular data with mixed static and dynamic features, decision-tree-based models like LightGBM consistently outperform neural networks. This is a well-known pattern in data science competitions, but worth emphasizing for practitioners excited about deep learning.

Without Transfer Learning
2195.7
RMSE (×10⁻⁵)
With Transfer Learning
1845.6
RMSE (×10⁻⁵)
−15.9%
Error reduction
with sample normalization
+ label differencing

Neither transfer strategy alone was enough. Label differencing without normalization didn't help; normalization without differencing didn't help either. Only combining both reduced the cross-city generalization error — by 15.9% on the best model.

What Drives the Prediction
Feature Importance

Yesterday's Demand Is the Best Predictor of Tomorrow's.

The LightGBM model ranks its features by how often it splits on them. The results confirm intuition — but also reveal some surprises about what doesn't matter as much as you'd think.

Time Series Features
67%
Temporal (Day/Season)
9.8%
Sociodemographics
9.6%
Meteorological
7.1%
Built Environment
6.6%

The dominance of time-series features (67%) reflects a fundamental property of urban mobility: tomorrow looks a lot like today, and a lot like last week. The top individual predictor is yesterday's demand (6.6%), followed by elapsed days since service launch (6.3%) — a proxy for the service maturity effect where early users behave differently than regular users.

Removing time-series features caused a 43% jump in prediction error. Removing built environment or sociodemographic features each caused less than 2% degradation — yet both still matter for spatial prediction accuracy.

The practical takeaway: you don't need a massive feature set to build a working demand predictor. You need the last 30 days of trips, the temperature forecast, and basic census data. All of it is free.

Spatial Patterns
Space & Time

Downtown Belongs to Weekends. Campuses Own Weekdays.

Both Austin and Louisville exhibit the same spatial split that the companion study of five cities found. University areas — UT Austin, University of Louisville — dominate weekday demand at all hours. Downtown entertainment districts flip to dominate on weekends and early mornings.

The models' spatial error analysis reveals something important: prediction errors concentrate around downtown and university zones. These high-demand areas are also where the model understimates peaks — because they're also where unpredictable events (festivals, games, concerts) spike demand beyond what historical patterns suggest.

🗺️

Different Urban Structures, Same Spatial Logic

Austin and Louisville have completely different city layouts and sizes — yet their scooter demand is spatially concentrated in the same types of zones (educational and entertainment hubs). This suggests the framework can generalize broadly.

🌡️

Seasonal Synchrony Across Cities

Both cities show demand increasing through spring and summer, dropping from October, and hitting lows in January — despite different climates. The scaled demand trends are nearly identical once fleet size differences are controlled.

📉

Under One Trip Per Vehicle Per Day — and That's a Problem

The median fleet utilization in both cities is below 1 trip/vehicle/day. Most scooters sit idle most of the time. The authors argue fleet sizes should be dynamically adjusted — ideally daily — to match predicted demand, not held constant.

For Cities & Operators
Policy Implications

A Practical Tool, Not Just a Research Exercise

The paper is explicit about practical applicability. The entire methodology uses publicly available data sources that any city or operator can access: open city trip portals, census.gov, openstreetmap.org, and visualcrossing.com. No proprietary data required.

🚀

Deploy New Cities Without Historical Data

The transfer learning approach means a city launching scooters for the first time can borrow demand patterns from a similar city. Only ~3 months of pilot data is needed in the target city before the model adapts.

📅

Dynamic Fleet Sizing by Season and Forecast

Instead of deploying a fixed 1,200 or 15,000 scooters year-round, operators can use daily utilization predictions to right-size fleets. Fewer idle scooters means less sidewalk clutter, lower redistribution costs, and improved sustainability metrics.

🎉

Special Events Need Special Planning

Austin's SXSW festival drove 5–6× normal demand — an extreme outlier the model struggled with. Event calendars should be integrated as explicit features in future model versions, with dedicated redistribution protocols.

📖

Open Data Publication Is a Policy Tool

The study would be impossible without cities publishing trip data. The authors explicitly call for more cities to follow suit — not just for research, but because transparency creates accountability and improves service quality.

The Full Technical Story

The paper contains the complete model specifications, all coefficient tables, spatial error maps, and full feature engineering procedures. Everything reproducible with open data.

Read the Full Paper →
Abouelela, M., Lyu, C., & Antoniou, C. (2023). Exploring the potentials of open-source big data and machine learning in shared mobility fleet utilization prediction. Data Science for Transportation, 5, 5. https://doi.org/10.1007/s42421-023-00068-9