Crude oil is one of the most-watched time series in finance — and one of the hardest to forecast. Geopolitics, OPEC+ quota changes, inventories, the dollar, and the demand cycle all push prices around on very different time scales.
This post is a short, reproducible walk-through of how I’d approach a fresh oil price series: look at it, make it stationary, summarise it, and produce a naive probabilistic forecast. Everything below is computed in the browser from the raw price array, so you can open the page source and trace every number.
Data
Monthly average spot prices for West Texas Intermediate (WTI) crude from January 2020 through December 2024, rounded from the Federal Reserve’s FRED series DCOILWTICO (monthly averages of daily closes). Sixty observations — enough to see regime shifts without drowning in noise.
Three regimes jump out:
- 2020 — the COVID demand shock. WTI collapsed from the high 50s in January to a monthly average near $17 in April (intraday it briefly went negative). Recovery was fast but partial.
- 2021–2022 — reopening + war premium. Prices climbed through 2021 and spiked above $110 in mid-2022 as the invasion of Ukraine layered a geopolitical risk premium on top of an already tight physical market.
- 2023–2024 — range-bound. Roughly $70–$90, with OPEC+ managing supply against softer global demand.
Making it stationary
Price levels are non-stationary — the mean clearly shifts across regimes and the variance explodes around March–April 2020. Before fitting any statistical model you need a stationary transformation. The standard move for asset prices is the log return:
r[t] = ln( P[t] / P[t-1] )
Log returns approximately centre around zero, are dimensionless (comparable across assets and across time), and are additive over time (a useful algebraic property you don’t get with simple returns).
The April 2020 observation is visibly an outlier — a roughly −55% monthly log return. Outside of that, returns look roughly stationary with occasional volatility clusters (a well-known stylised fact that motivates GARCH-style models).
Summary statistics
Sharpe-style ratios on commodity spot prices are usually poor — crude is not a buy-and-hold asset, it’s an exposure you take a view on. But knowing the empirical drift and volatility is exactly what you need to build the simplest honest forecast.
A naive forecast: random walk with drift
If we’re willing to assume log returns are approximately i.i.d. with mean μ̂ and standard deviation σ̂ — a classic random walk with drift — then the h-step-ahead point forecast and 95% confidence band are:
point: P[t+h] = P[t] · exp( h · μ̂ )
95% CI: P[t] · exp( h · μ̂ ± 1.96 · σ̂ · sqrt(h) )
The sqrt(h) term is the hallmark of the diffusion: uncertainty grows with the square root of the horizon, not linearly. The chart below projects the next 12 months from the end of our sample.
That band is wide on purpose. Crude is volatile enough that a one-year 95% interval easily spans $40–$120, which is honest about the limits of a simple model. Any tighter forecast has to earn that width by bringing in structure the random walk ignores: supply curves, inventory, macro factors, options-implied volatility, or regime-switching dynamics.
Takeaways
- Always transform before modelling. Prices are non-stationary; log returns are a far better canvas for any statistical model.
- Know your empirical $\mu$ and $\sigma$. Before you add complexity (AR, ARIMA, GARCH, VAR, machine learning), the drift-and-vol pair is the benchmark everything else has to beat.
- Respect the uncertainty. A wide forecast band isn’t a failure of the model; it’s the model being honest about what a stationary random walk can know.
In future posts I’ll extend this into a proper ARIMA fit, compare against a GARCH variance model, and layer in exogenous regressors (the DXY, inventories, refinery utilisation). For a one-pager, though, random walk with drift is a surprisingly strong baseline.