Contest Kinds: CandidateRace, BallotMeasure, TurnoutMetadata
Every record in the pipeline belongs to exactly one of three contest kinds. This is modeled as a type-level enum — not a string field — so that invalid combinations are rejected at compile time rather than discovered at query time.
Why three kinds
Election data files mix three fundamentally different things in the same tabular format:
- A candidate running for office and receiving votes.
- A ballot measure (bond, referendum, constitutional amendment) where voters choose “Yes” or “No.”
- A metadata row recording registered voters or ballots cast for a precinct, masquerading as a contest.
Sources do not distinguish these. MEDSL puts REGISTERED VOTERS in the office column as if it were a race. NC SBE creates a “contest” called Registered Voters - Total with a “candidate” whose vote count is actually the registration total. Florida OpenElections has 6,013 rows where office = "Registered Voters" — 67.9% of all non-candidate records in the initial FL load.
If these are not separated at parse time, downstream analysis produces nonsense: “Registered Voters” appears as the most popular candidate in America, “For” shows up as a person’s name in entity resolution, and vote totals are inflated by turnout metadata.
The enum
enum ContestKind {
CandidateRace {
results: Vec<CandidateResult>,
},
BallotMeasure {
choices: Vec<BallotChoice>,
measure_type: BallotMeasureType,
passage_threshold: Option<f64>,
},
TurnoutMetadata {
registered_voters: Option<u64>,
ballots_cast: Option<u64>,
},
}
Each variant carries different fields. You cannot accidentally attach a candidate_name to a ballot measure or a passage_threshold to a candidate race.
CandidateRace
The common case. A person is running for an office and received votes.
| Field | Type | Description |
|---|---|---|
results | Vec<CandidateResult> | One entry per candidate in the contest |
Each CandidateResult contains:
| Field | Type | Description |
|---|---|---|
candidate_name | CandidateName | Decomposed name (raw, first, middle, last, suffix, nickname, canonical_first) |
party | Party | Raw string + normalized enum |
votes_total | u64 | Total votes received |
vote_share | Option<f64> | Percentage of total contest votes |
vote_counts_by_type | VoteCountsByType | Breakdown: election_day, early, absentee_mail, provisional |
Examples of CandidateRace contests:
US SENATE— federalGOVERNOR— stateCOLUMBUS COUNTY SCHOOLS BOARD OF EDUCATION DISTRICT 02— localSHERIFF— county
BallotMeasure
Voters choose between options (typically “For”/“Against” or “Yes”/“No”) on a proposition, bond, amendment, or referendum.
| Field | Type | Description |
|---|---|---|
choices | Vec<BallotChoice> | One entry per option |
measure_type | BallotMeasureType | Bond, amendment, referendum, etc. |
passage_threshold | Option<f64> | Required vote share for passage (e.g., 0.60 for a bond requiring 60%) |
Each BallotChoice contains:
| Field | Type | Description |
|---|---|---|
choice_text | String | “For”, “Against”, “Yes”, “No”, or other option text |
votes_total | u64 | Votes for this choice |
vote_share | Option<f64> | Percentage of total votes |
The BallotMeasureType enum: Bond, ConstitutionalAmendment, Referendum, Initiative, Recall, Retention, Levy, Advisory, Other.
Why this prevents name confusion
Without the BallotMeasure variant, the L1 parser would treat “For” and “Against” as candidate names. They would flow into entity resolution at L3, where the system would try to find other elections where “For” ran for office. By assigning ballot measures to their own variant at parse time, the choice_text field is never passed to the name decomposition or embedding logic.
Detection at L1 uses two signals:
- The contest name contains keywords: “bond”, “amendment”, “referendum”, “proposition”, “measure”, “levy”, “question”.
- The choice values are in the set {“For”, “Against”, “Yes”, “No”, “Bonds”, “No Bonds”}.
TurnoutMetadata
Not a contest at all. These rows carry precinct-level registration and turnout counts that sources embed in the results file as pseudo-contests.
| Field | Type | Description |
|---|---|---|
registered_voters | Option<u64> | Registered voter count for this precinct |
ballots_cast | Option<u64> | Total ballots cast in this precinct |
Source examples that produce TurnoutMetadata records:
| Source | office / Contest Name value | candidate / Choice value |
|---|---|---|
| MEDSL | REGISTERED VOTERS | REGISTERED VOTERS |
| MEDSL | BALLOTS CAST - TOTAL | BALLOTS CAST |
| NC SBE | Registered Voters - Total | (numeric total in vote column) |
| OpenElections FL | Registered Voters | (numeric total) |
Detection at L1: the contest name matches a known set of turnout keywords (REGISTERED VOTERS, BALLOTS CAST, BALLOTS CAST - TOTAL, BALLOTS CAST - BLANK). When detected, the vote count is extracted into registered_voters or ballots_cast, and the record is tagged as TurnoutMetadata rather than CandidateRace.
These extracted turnout values backfill the turnout section of other records in the same precinct. Currently, turnout data is populated for less than 5% of records because most MEDSL state files do not include registration count rows.
Classification at L1
Contest kind assignment happens during L1 parsing — the deterministic layer. No ML, no embeddings, no API calls. The decision tree:
- Does the contest name match a turnout keyword? →
TurnoutMetadata - Do the choice values match ballot measure patterns (“For”/“Against”/“Yes”/“No”)? →
BallotMeasure - Does the contest name contain ballot measure keywords? →
BallotMeasure - Otherwise →
CandidateRace
This classification is stored in the record and carried through all subsequent layers. L2 embeds only CandidateRace records for entity resolution. L3 matches only CandidateRace records. BallotMeasure and TurnoutMetadata records pass through L2–L4 without modification beyond provenance tracking.