When the LLM Gets Called (And When It Doesn’t)

The LLM is a confirmation tool, not a discovery tool. It is called when cheaper methods have narrowed the problem to a specific, bounded question. It is never called when a deterministic method produces correct results.

This boundary is enforced by pipeline structure, not by discipline. L0 and L1 have no LLM code paths. L2 has none. The LLM is reachable only from L3 (entity resolution and tier 4 office classification) and L4 (entity auditing). A developer cannot accidentally add an LLM call to the parser — the parser runs at L1, which has no API client.

When the LLM Is Called

Three situations invoke the LLM. Each is a bounded question with structured input and a constrained output format.

1. Ambiguous Entity Matches (L3, Step 4)

Trigger: Embedding cosine similarity between 0.35 and 0.95 AND the name similarity gate passed (JW on last names ≥ 0.50) AND both candidates are in the same state.

Input: Structured name components for both candidates, embedding score, JW score, vote counts, office, state, party.

Output: match/no-match, confidence (0.0–1.0), free-text reasoning.

Model: Claude Sonnet.

Volume: 3.5% of candidate pairs in our prototype (30 calls out of ~850 comparisons). With the step 2.5 gate in place, this drops to near-zero for within-source matching and rises for cross-source matching where name formats diverge.

Real examples:

Pair	Cosine	LLM Decision	Why LLM was needed
Charlie Crist / CRIST, CHARLES JOSEPH	0.451	match (0.95)	Nickname below any safe auto-accept threshold
Robert Williams / Robert Williams Jr	0.862	no match (0.85)	Suffix above old auto-accept; only LLM catches generational distinction
Nicole Fried / FRIED, NIKKI	0.642	match (0.92)	Nickname in ambiguous zone

2. Tier 4 Office Classification (L2→L3 boundary)

Trigger: Office name was not classified by keyword (tier 1), regex (tier 2), or embedding nearest-neighbor with cosine ≥ 0.60 (tier 3).

Input: Office name string, state, county, the full taxonomy of (office_level, office_branch) pairs.

Output: Classification pair, confidence (0.0–1.0), reasoning.

Model: Claude Sonnet.

Volume: ~0.5% of unique office names in MEDSL 2022 (~42 of 8,387). By record count, far less — these are the rarest, most obscure offices.

Real examples:

Office Name	State	LLM Classification	Confidence
Santa Rosa Island Authority	FL	special_district / infrastructure	0.90
Register of Mesne Conveyances	SC	county / judicial	0.88
Hog Reeve	NH	municipal / regulatory	0.60

3. L4 Entity Auditing

Trigger: An entity cluster contains records from multiple sources, multiple elections, or multiple office types. In the current design, every multi-member entity is audited (budget is not a constraint).

Input: The full entity cluster — canonical name, all aliases, all elections, all vote counts, all states, all offices.

Output: Plausibility assessment: plausible / suspicious / error, with reasoning.

Model: Claude Sonnet (Opus-class for flagged entities).

Volume: In the prototype, 50 entities were audited. The LLM flagged 43 as suspicious (precinct-level records inflating temporal chains — a bug in our aggregation, not in the data) and 4 as errors (“For” and “Against” classified as person entities). At production scale, the volume scales with the number of multi-member entities, not with total records.

When the LLM Is Not Called

Everything else. Specifically:

Operation	Layer	Method	Why not LLM
CSV/TSV/XML parsing	L1	Source-specific parser	Deterministic; format is fixed per source
Name decomposition	L1	Rule-based parser	Deterministic; name formats are enumerable
Nickname dictionary lookup	L1	Hash table	O(1) lookup; no reasoning needed
FIPS code enrichment	L1	Census reference table	Exact match on (state, county_name)
Vote share computation	L1	Arithmetic	Division is deterministic
Hash computation	L1–L4	SHA-256	Cryptographic function; no reasoning needed
Office classification (tiers 1–2)	L1	Keyword + regex	Deterministic; handles 62% of unique names
Office classification (tier 3)	L2	Embedding nearest-neighbor	Deterministic given model version; handles 4.5% more
Embedding generation	L2	OpenAI API	Deterministic given model version; not an LLM call
Exact name matching (step 1)	L3	Structured field equality	Handles 70% of entity resolution
Jaro-Winkler matching (step 2)	L3	String similarity	Deterministic; handles 0.1% more
Name gate (step 2.5)	L3	JW on last names	Eliminates obvious non-matches
High-confidence embedding match (step 3)	L3	Cosine ≥ 0.95	Auto-accept; no ambiguity to resolve
Canonical name selection	L4	Fixed algorithm	Most-complete + most-authoritative; no judgment needed
Temporal chain aggregation	L4	Group-by on (entity_id, election_date)	SQL-style aggregation
Hash chain verification	L4	SHA-256 recomputation	Cryptographic verification
Cross-source vote reconciliation	L4	Arithmetic comparison	Exact or percentage-based comparison

The Principle

If a deterministic method handles it, do not add LLM latency and non-determinism.

This is not a cost argument. Budget is not a constraint. It is an accuracy and reproducibility argument:

Deterministic methods do not hallucinate. SHA-256 always returns the same hash. FIPS lookup always returns the same code. An LLM might return a different FIPS code on a second call — not because it is wrong, but because it is probabilistic. For operations with known-correct deterministic solutions, adding an LLM is adding risk, not capability.
Deterministic methods are reproducible. Re-running L1 on the same L0 files with the same parser version produces bit-identical output. Re-running an LLM-based parser may produce different field values. For a pipeline that serves journalists and researchers who need to cite specific numbers, reproducibility is non-negotiable for the operations that support it.
Deterministic methods are fast. L1 processes 200 records in under a second. An LLM call takes 200–2,000ms. For the 70% of entity resolution handled by exact match and the 62% of office classification handled by keywords, the LLM adds latency with zero accuracy benefit.

The LLM is powerful. It correctly identified all 12 test pairs in entity resolution, including the Crist nickname case (0.451 cosine) that no threshold-based system could safely auto-resolve. It classified all 9 tier-4 office names correctly, including obscure offices like “Hog Reeve” that no reference set could anticipate.

But it is called only for the cases that need it: the 3.5% of entity comparisons in the ambiguous zone, the 0.5% of office names that no pattern matches, and the entity audit that catches contamination like ballot-measure choices misclassified as people. For everything else, the answer is already known — deterministically, reproducibly, and instantly.

Cross-References

Design Principles — “Deterministic first” as principle #1
L3: Matched — where LLM calls happen for entity resolution
The Four-Tier Classifier — where LLM calls happen for office classification
Budget Is Not a Constraint — why the cascade exists despite unlimited budget

Keyboard shortcuts

Election Aggregation