Beyond Sentiment: Event Extraction from Earnings Calls and Filings
Headline sentiment — "is this article bullish or bearish?" — is the entry-level NLP task for traders, and it is genuinely useful. But a single positive/negative score throws away most of the information in a document. The more valuable, and harder, job is event extraction: pulling out the specific, structured facts that actually move prices. This post is about that next step up.
Sentiment tells you the mood; events tell you what happened
A sentiment model might score an earnings press release as mildly positive. Event extraction instead reads it as a set of facts: guidance raised, dividend cut, CFO resigned, buyback announced. Those are tradeable, comparable, and far more specific than a mood. Two filings with the same sentiment score can carry opposite events.
What you are actually extracting
The pipeline, roughly
The trap that silently ruins everything: timestamps
This is the look-ahead bias of the NLP world. If you train or backtest on a filing using the date you downloaded it rather than the moment it became public, you are trading on information from the future. Earnings calls, filing acceptance times, and news wire timestamps must be recorded to the minute, and your backtest must only ever use a document after its genuine public-availability time. Get this wrong and you will build a spectacular strategy that cannot be traded.
LLMs help — but verify
Large language models are excellent at reading a messy transcript and returning structured JSON of events. They are also confident when they are wrong: they will hallucinate a dividend cut that was never announced. For anything that drives an order, constrain the model to extract only what is in the text, validate the output against a schema, and spot-check against the source. A wrong extraction is worse than no extraction.
Bottom line
Sentiment is a thermometer; event extraction is a stethoscope. It is more work — entity linking, schema design, and obsessive timestamp hygiene — but it produces specific, comparable, tradeable facts instead of a vague mood. Start with one event type you care about, get the timestamps provably correct, and only then widen the net.
What financial text are you parsing, and how do you guarantee your timestamps are honest? Share your pipeline below.
Headline sentiment — "is this article bullish or bearish?" — is the entry-level NLP task for traders, and it is genuinely useful. But a single positive/negative score throws away most of the information in a document. The more valuable, and harder, job is event extraction: pulling out the specific, structured facts that actually move prices. This post is about that next step up.
Sentiment tells you the mood; events tell you what happened
A sentiment model might score an earnings press release as mildly positive. Event extraction instead reads it as a set of facts: guidance raised, dividend cut, CFO resigned, buyback announced. Those are tradeable, comparable, and far more specific than a mood. Two filings with the same sentiment score can carry opposite events.
What you are actually extracting
- Entities. Companies, tickers, people, products, places — via Named-Entity Recognition (NER). The first hard problem is entity linking: mapping "the company", "Apple", and "AAPL" to one canonical identifier.
- Events. The structured happenings: M&A, guidance changes, litigation, management changes, product launches, regulatory actions.
- Arguments / roles. Who did what to whom, and by how much: which company is the acquirer, what the new guidance number is, when it takes effect.
The pipeline, roughly
- Source and clean. Earnings call transcripts, 8-K / 10-Q filings, regulatory feeds. Strip boilerplate, segment into sentences, and keep the timestamp — it matters more than the text (see below).
- Tag. Run NER and a classifier (or, increasingly, a fine-tuned transformer like FinBERT or a general LLM) to label entities and candidate events.
- Link and structure. Resolve entities to tickers, attach the event arguments, and emit a structured record: {ticker, event_type, value, timestamp, source}.
- Turn into a feature. Aggregate those records into signals: event surprise vs expectation, event frequency, or a same-day directional flag.
The trap that silently ruins everything: timestamps
This is the look-ahead bias of the NLP world. If you train or backtest on a filing using the date you downloaded it rather than the moment it became public, you are trading on information from the future. Earnings calls, filing acceptance times, and news wire timestamps must be recorded to the minute, and your backtest must only ever use a document after its genuine public-availability time. Get this wrong and you will build a spectacular strategy that cannot be traded.
LLMs help — but verify
Large language models are excellent at reading a messy transcript and returning structured JSON of events. They are also confident when they are wrong: they will hallucinate a dividend cut that was never announced. For anything that drives an order, constrain the model to extract only what is in the text, validate the output against a schema, and spot-check against the source. A wrong extraction is worse than no extraction.
Bottom line
Sentiment is a thermometer; event extraction is a stethoscope. It is more work — entity linking, schema design, and obsessive timestamp hygiene — but it produces specific, comparable, tradeable facts instead of a vague mood. Start with one event type you care about, get the timestamps provably correct, and only then widen the net.
What financial text are you parsing, and how do you guarantee your timestamps are honest? Share your pipeline below.