Registered Voters, Ballots Cast, Over/Under Votes

Some election data files embed turnout metadata and vote-quality indicators directly alongside candidate results. A row labeled “Registered Voters” is not a contest — it is a count of eligible voters in that precinct. A row labeled “Over Votes” is not a candidate — it is a count of ballots where the voter marked too many choices.

These rows are valuable. They are also poison if treated as candidates.

The Four Categories

Label	What it means	Found in
Registered Voters	Eligible voters in the precinct	NC SBE, FL OpenElections
Ballots Cast	Ballots submitted (any contest)	NC SBE, some MEDSL records
Over Votes	Ballots with too many selections for a contest	NC SBE, ME, UT
Under Votes	Contests where the voter made no selection	NC SBE, ME, UT

NC SBE includes all four in every precinct file. MEDSL includes over/under votes for some states but not others. OpenElections varies by state and contributor. There is no standard.

The Extract-Before-Filter Principle

The instinct is to filter these rows out immediately — they are not candidates, so drop them. This is wrong. The registered voter count is the denominator for turnout computation. Dropping it before extraction destroys the only turnout signal available at the precinct level.

The correct sequence:

Detect the row by candidate name pattern (Registered Voters, BALLOTS CAST, OVER VOTES, UNDER VOTES, BLANK).
Extract the value into the appropriate field on sibling contest records in the same precinct.
Route the row to TurnoutMetadata contest kind — not CandidateRace.
Exclude the row from candidate-level analysis (margins, competitiveness, entity resolution).

Step 2 is the key. The registered voter count attaches to every contest in the same precinct as a turnout.registered_voters field. The ballots cast count becomes turnout.ballots_cast. Only after extraction is the metadata row itself reclassified.

NC SBE Row Format

In the NC SBE precinct results file (results_pct_20221108.txt), a registered voter row looks like:

Column	Value
Contest Name	`REGISTERED VOTERS - TOTAL`
Choice	(empty)
Choice Party	(empty)
Total Votes	`4,217`
Election Day	`4,217`
One Stop	`0`
Absentee by Mail	`0`
Provisional	`0`

The “Total Votes” column contains the registered voter count, not a vote total. The vote-type breakdown is meaningless (registered voters do not have an election-day vs. early split). L1 extracts 4,217 into turnout.registered_voters for precinct P17 in Columbus County, then classifies this row as TurnoutMetadata.

The corresponding L1 output:

{
  "contest": {
    "kind": "turnout_metadata",
    "raw_name": "REGISTERED VOTERS - TOTAL"
  },
  "results": [{
    "candidate_name": { "raw": "Registered Voters" },
    "votes_total": 4217
  }],
  "turnout": {
    "registered_voters": 4217
  }
}

Sibling contest records in the same precinct (e.g., the school board race) receive:

{
  "turnout": {
    "registered_voters": 4217,
    "ballots_cast": null
  }
}

Scale of the Problem

In the Florida OpenElections dataset, 6,013 rows are labeled “Registered Voters” — representing 67.9% of all non-candidate records in that file. Without detection, these rows enter the candidate pipeline as if “Registered Voters” were a person running for office. The L4 LLM audit flagged exactly this pattern in our prototype.

Over Votes and Under Votes are less numerous but equally disruptive. Maine labels its under votes as BLANK. Utah includes TOTAL VOTES aggregation rows. Each source has its own vocabulary for the same concept.

Detection Rules

L1 applies pattern matching on the candidate name field before any other processing:

Pattern	Classification	Action
`registered voters`	TurnoutMetadata	Extract to `turnout.registered_voters`
`ballots cast`	TurnoutMetadata	Extract to `turnout.ballots_cast`
`over ?votes?`	TurnoutMetadata	Extract to `turnout.over_votes`
`under ?votes?`	TurnoutMetadata	Extract to `turnout.under_votes`
`^blank$`	TurnoutMetadata	Extract to `turnout.under_votes` (ME)
`total votes`	TurnoutMetadata	Discard (aggregation artifact)
`scattering`	TurnoutMetadata	Extract to `turnout.write_in_scattering` (IA)

These patterns are checked case-insensitively. They run as the first operation in the L1 pipeline — before name decomposition, before office classification, before FIPS enrichment. A row that matches is routed immediately and never enters the candidate pipeline.