Feature Engineering for Financial Machine Learning: Turning Raw Prices into Signal

Feature Engineering for Financial Machine Learning: Turning Raw Prices into Signal

Hand a machine-learning model raw closing prices and it will learn almost nothing useful — or worse, it will learn something that looks brilliant in backtest and collapses live. In quantitative trading, the model is rarely the hard part. The hard part is feature engineering: transforming raw market data into inputs that actually carry predictive information, without smuggling in the future. This is where most of the real edge — and most of the real mistakes — live.

Why raw prices are bad features

Price levels are non-stationary: their mean and variance drift over time, so a level that meant "expensive" five years ago is meaningless today. Models trained on raw levels effectively memorize a specific price range and fail when the market moves out of it. The fix is to transform prices into something more stable:

Returns (especially log returns) instead of levels — roughly stationary and comparable across time and instruments.
Normalized or standardized values — z-scores, or scaling to a fixed range, so features live on comparable scales.
Relative measures — distance from a moving average, percent rank within a rolling window, ratios rather than absolutes.

Useful families of features

Good features encode different facets of market behavior:

Momentum / trend — returns over multiple horizons, moving-average slopes, the kind of information behind indicators like RSI or MACD.
Volatility — rolling standard deviation, ATR, realized volatility; regime matters as much as direction.
Volume / liquidity — relative volume, order-flow imbalance, spread.
Calendar / seasonality — time of day, day of week, sessions, encoded cyclically (sine/cosine) rather than as raw integers.
Cross-sectional — how an instrument ranks against its peers at the same moment.

The deadly sin: lookahead bias and data leakage

The single most dangerous error in financial ML is letting future information leak into a feature. It produces spectacular backtests that are pure fiction. Guard against it relentlessly:

Only use information available at the bar you are predicting from. A feature computed at time T must use data up to T and not one tick later.
Fit transforms on training data only. Computing a mean, standard deviation or scaler over the whole dataset — including the test period — leaks the future into the past. Fit on train, apply to test.
Beware indicator windows that peek. Some "centered" smoothers and repainting indicators use future bars by construction. Confirm every feature is causal.
Mind survivorship and point-in-time data. Use the data as it actually was at the time, not as later revised or with delisted names removed.

Labels are features too

How you define the thing you are predicting matters as much as the inputs. A naive "next bar up or down" label is noisy and ignores risk. More robust labeling — for example, whether a profit target is hit before a stop within a horizon (the triple-barrier idea) — produces targets that reflect how the strategy would actually trade.

Less is often more

With enough features, a model will always find spurious patterns in the noise — the multiple-comparisons trap again. Prefer a smaller set of economically meaningful features, check their stability across time, and validate with proper out-of-sample and walk-forward testing rather than trusting in-sample fit.

Bottom line

In financial machine learning, feature engineering is the strategy. Make features stationary, scale them honestly, encode genuine market behavior, and — above all — be paranoid about leakage and lookahead, because the market will not pay you for an edge that only existed because your backtest could see the future. Build causal, stable, economically sensible features and the model becomes the easy part.