Non-Candidate Records
Not every row in an election results file is a candidate. Sources routinely embed turnout metadata, ballot measure choices, vote quality indicators, and aggregation artifacts alongside candidate results — using the same columns, the same format, and no reliable flag to distinguish them.
If your system treats every row as a candidate, you will create entity records for people named “Registered Voters”, “For”, “BLANK”, and “TOTAL VOTES”. The L4 LLM audit in our prototype caught exactly this: “For” and “Against” were classified as person entities. They are not people.
The Four Categories
1. Turnout Metadata
Rows recording registration and participation counts at the precinct level:
| Pseudo-candidate | Meaning | Source |
|---|---|---|
| Registered Voters | Total registered voters in precinct | FL OpenElections, NC SBE |
| Ballots Cast | Total ballots submitted | FL OpenElections, NC SBE |
| Cards Cast | Total ballot cards (may differ from ballots in multi-card elections) | FL OpenElections |
Florida OpenElections is the most prolific source. Of the “other” records in our FL 2022 ingest, 6,013 rows are “Registered Voters” — accounting for 67.9% of all non-candidate records in that source. These are not errors in the source data. They are genuine turnout figures published alongside contest results in the same file format.
2. Ballot Measure Choices
Rows representing choices on referenda, bond issues, and constitutional amendments:
| Pseudo-candidate | Meaning | Source |
|---|---|---|
| For | Yes vote on ballot measure | OpenElections, MEDSL |
| Against | No vote on ballot measure | OpenElections, MEDSL |
| Yes | Yes vote on ballot measure | NC SBE, MEDSL |
| No | No vote on ballot measure | NC SBE, MEDSL |
These are legitimate vote counts — but the “candidate” is not a person. Detection requires examining both the candidate name (a single common word) and the contest name (bond, referendum, amendment, proposition). See Ballot Measure Choices.
3. Vote Quality Indicators
Rows recording ballots that did not produce a valid vote for any candidate:
| Pseudo-candidate | Meaning | Source |
|---|---|---|
| Over Votes | Voter selected more candidates than allowed | MEDSL, NC SBE |
| Under Votes | Voter selected fewer candidates than allowed | MEDSL, NC SBE |
| BLANK | No selection made (Maine’s term for undervote) | MEDSL (ME) |
| Write-in | Aggregate write-in count (no specific candidate) | Multiple sources |
Over votes and under votes are important data quality signals. A contest with 15% over votes may indicate a confusing ballot design. But they are not candidates and must not be counted as such.
4. Aggregation Artifacts
Rows that are computational summaries, not individual results:
| Pseudo-candidate | Meaning | Source |
|---|---|---|
| TOTAL VOTES | Sum of all candidates in the contest | MEDSL (UT) |
| Scattering | Aggregate of write-in candidates below reporting threshold | MEDSL (IA, MN) |
| TOTAL | Another sum variant | OpenElections |
These rows are redundant with the candidate-level data. Including them double-counts votes and inflates totals.
The Detection Strategy
Non-candidate records are detected at L1 — the earliest possible point. The principle is extract before filter: non-candidate rows often contain valuable information (registered voter counts, undervote rates) that should be captured in the correct schema object before the row is excluded from contest analysis.
Detection uses a three-part check:
-
Exact match on candidate name. A lookup table of ~40 known pseudo-candidate strings: “Registered Voters”, “Ballots Cast”, “Over Votes”, “Under Votes”, “BLANK”, “TOTAL VOTES”, “Scattering”, “For”, “Against”, “Yes”, “No”, etc.
-
Contest name pattern. For ambiguous names like “For” and “Against”, check whether the contest name contains ballot measure keywords: bond, referendum, amendment, proposition, measure, question, initiative, charter.
-
Source-specific rules. Some sources use unique pseudo-candidates. Maine uses “BLANK”. Iowa uses “Scattering”. Utah includes “TOTAL VOTES” rows. Each source parser knows its own ghosts.
Routing
Detected non-candidate records are routed to the appropriate schema object:
| Category | Route to | Schema type |
|---|---|---|
| Turnout metadata | TurnoutMetadata | Attached to sibling precinct records |
| Ballot measure choices | BallotMeasure | MeasureChoice with For/Against/Yes/No |
| Vote quality indicators | VoteQuality | Attached to parent contest record |
| Aggregation artifacts | Discarded | Redundant with candidate-level sums |
Records routed to TurnoutMetadata and VoteQuality are preserved in the L1 output — they are valuable data, just not candidate data. Aggregation artifacts are discarded with a note in the cleaning report.
What Happens Without Detection
If non-candidate rows pass through to L2 and L3:
- “Registered Voters” gets an embedding vector, a candidate entity ID, and appears in 6,013 precinct-level records as the most prolific “candidate” in Florida.
- “For” and “Against” become person entities. The L4 LLM audit flagged exactly this in our prototype: “‘For’ is not a plausible person name.”
- “TOTAL VOTES” inflates vote counts when aggregated, because the total row is summed alongside the individual candidate rows.
- “Over Votes” appears as a candidate who received votes in every contest — the busiest politician in America.
Detection at L1 prevents all of these downstream errors.
Sub-Chapters
- Registered Voters, Ballots Cast, Over/Under Votes — turnout and vote quality rows, the “extract before filter” principle
- Ballot Measure Choices: For/Against/Yes/No — detecting ballot measures from candidate name + contest name patterns