Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

MEDSL — MIT Election Data + Science Lab

The MIT Election Data + Science Lab publishes precinct-level election returns for all 50 states and the District of Columbia. The data is hosted on the Harvard Dataverse (electionscience collection) and mirrored on GitHub for recent cycles. It is the most complete single source of US election data available without a paywall or API key.

What MEDSL contains

MEDSL provides one CSV or tab-delimited file per state per election cycle. Each row represents one candidate in one precinct for one vote mode (election day, absentee, early voting, provisional, etc.). To obtain the total votes for a candidate in a precinct, you must sum across all rows for that candidate and precinct.

Available election cycles:

CycleLocationFormatDOI
2022GitHubCSV, one ZIP per state
2020Harvard DataverseCSV/TAB, one file per state10.7910/DVN/NT66Z3
2018Harvard DataverseCSV/TAB, one file per state10.7910/DVN/NVQYMG
2016Harvard DataverseCSV/TAB10.7910/DVN/NH5S2I
2019 (odd-year)Harvard DataverseCSV/TAB10.7910/DVN/2AJUII
2017 (odd-year)Harvard DataverseCSV/TAB10.7910/DVN/VNJAB1

We have downloaded and loaded 2018, 2020, and 2022. Together they contain approximately 36.5 million rows.

Schema

MEDSL files have 25 columns. The delimiter is comma for most states but tab for some; auto-detection handles this.

ColumnTypeDescriptionExample
precinctstringPrecinct identifier from the source12-13
officestringContest name, ALL CAPSCABARRUS COUNTY SCHOOLS BOARD OF EDUCATION
party_detailedstringFull party nameNONPARTISAN
party_simplifiedstringNormalized partyNONPARTISAN
modestringVote type for this rowELECTION DAY
votesintegerVote count for this mode79
candidatestringCandidate name, ALL CAPSGREG MILLS
districtstringDistrict identifier or blankSTATEWIDE, 003, ``
dataversestringRace level tag — see belowSTATE, SENATE, HOUSE, ``
stagestringElection stageGEN
specialstringSpecial election flagFALSE
writeinstringWrite-in flagFALSE
datedateElection date2022-11-08
yearintegerElection year2022
county_namestringCounty name, ALL CAPSCABARRUS
county_fipsstring5-digit county FIPS37025
jurisdiction_namestringJurisdiction nameCABARRUS
jurisdiction_fipsstringJurisdiction FIPS37025
statestringFull state nameNORTH CAROLINA
state_postring2-letter postal codeNC
state_fipsstring2-digit state FIPS37
state_censtringCensus state code56
state_icstringICPSR state code47
readme_checkstringData quality flagFALSE
magnitudeintegerNumber of seats in this contest3

The dataverse column and local races

MEDSL tags each row with a dataverse value indicating which Harvard Dataverse sub-collection the race belongs to:

ValueMeaningExample offices
PRESIDENTPresidential racePresident
SENATEUS SenateUS Senate
HOUSEUS HouseUS House District 7
STATEState-level officesGovernor, State Senate, Attorney General
(blank)Everything else — including all local racesCounty Commissioner, School Board, Sheriff, Soil and Water

Local races are identified by a blank dataverse column, not by the value LOCAL. This is a frequent source of confusion. In the 2022 North Carolina file, 385,260 of 684,712 rows (56%) have a blank dataverse value. These rows contain school board races, county commissioner races, soil and water conservation districts, district court judges, mayors, city councils, and other local offices.

In the full 2022 national dataset (12.3 million rows), approximately 5.1 million rows (41.5%) have a blank dataverse value.

The mode column and vote totals

Each row in MEDSL represents one candidate’s votes for one vote mode. A single candidate in a single precinct may have multiple rows:

12-13,US SENATE,LIBERTARIAN,LIBERTARIAN,ELECTION DAY,47,SHANNON W BRAY,...
12-13,US SENATE,LIBERTARIAN,LIBERTARIAN,ABSENTEE BY MAIL,5,SHANNON W BRAY,...
12-13,US SENATE,LIBERTARIAN,LIBERTARIAN,EARLY VOTING,38,SHANNON W BRAY,...
12-13,US SENATE,LIBERTARIAN,LIBERTARIAN,PROVISIONAL,0,SHANNON W BRAY,...

To get Shannon W. Bray’s total votes in precinct 12-13, sum the votes column across all modes: 47 + 5 + 38 + 0 = 90.

Some states include a TOTAL mode row that pre-sums the other modes. Some do not. Your aggregation logic must handle both cases. If TOTAL rows are present, either use them directly and skip the individual mode rows, or skip TOTAL and sum the modes yourself. Do not double-count.

Common mode values: ELECTION DAY, ABSENTEE BY MAIL, EARLY VOTING, ONE STOP, PROVISIONAL, TOTAL.

Name formatting

MEDSL candidate names are ALL CAPS with no periods after initials:

MEDSLActual name
SHANNON W BRAYShannon W. Bray
VICTORIA P PORTERVictoria P. Porter
MICHAEL "STEVE" HUBERMichael “Steve” Huber
ROBERT VAN FLETCHER JRRobert Van Fletcher, Jr.
LM "MICKEY" SIMMONSL.M. “Mickey” Simmons

Nicknames appear in double quotes within the name string. Suffixes (JR, SR, III) appear without a preceding comma.

Write-in candidates are aggregated into a single row with candidate = WRITEIN and writein = TRUE.

Non-candidate rows

Some states include metadata rows in the data that are not candidate results:

office valueMeaningAction
REGISTERED VOTERSVoter registration countExtract as turnout metadata, do not treat as a contest
BALLOTS CASTBallots cast countExtract as turnout metadata
BALLOTS CAST - TOTALSameExtract
BALLOTS CAST - BLANKBlank ballot countExtract
STRAIGHT PARTYStraight-ticket party voteTypically excluded from contest analysis
OVER VOTESOvervote countExtract as quality metadata
UNDER VOTESUndervote countExtract as quality metadata

These rows are present in some states and absent in others. Florida OpenElections data contains 6,013 “Registered Voters” rows — 67.9% of all records classified as “other” in initial processing.

Known coverage gaps

MEDSL 2022 contains local race data for 44 of 51 jurisdictions. Seven states have zero rows with a blank dataverse column:

StateLikely reason
CaliforniaLocal results published separately by each county; not aggregated by MEDSL
IowaLocal results not included in the MEDSL state file
KansasSame
New JerseySame
PennsylvaniaSame
TennesseeSame
WisconsinSame

This does not mean these states lack local elections. It means MEDSL’s curation process did not capture them for 2022. Coverage may differ in other years.

The votes column type

The votes column is predominantly integer, but some state files contain non-integer values. We observed:

  • Floating-point values (likely vote shares erroneously placed in the votes column)
  • Asterisks (*) indicating suppressed data
  • Empty strings

Parse with TRY_CAST or equivalent. In our load of the full 2022 dataset, 12,782 rows had non-integer votes values out of 12.3 million total (0.1%).

Download

# 2022 — All 51 files from GitHub
mkdir -p local-data/sources/medsl/2022
for state in ak al ar az ca co ct dc de fl ga hi ia id il in ks ky la \
            ma md me mi mn mo ms mt nc nd ne nh nj nm nv ny oh ok or pa \
            ri sc sd tn tx ut va vt wa wi wv wy; do
  curl -L -o "local-data/sources/medsl/2022/2022-${state}-local-precinct-general.zip" \
    "https://raw.githubusercontent.com/MEDSL/2022-elections-official/main/individual_states/2022-${state}-local-precinct-general.zip"
done

# Unzip
for f in local-data/sources/medsl/2022/*.zip; do
  unzip -o "$f" -d "${f%.zip}"
done
# 2020 — NC example from Harvard Dataverse (file ID 6100444)
mkdir -p local-data/sources/medsl/2020
curl -L -o local-data/sources/medsl/2020/2020-nc-precinct-general.csv \
  "https://dataverse.harvard.edu/api/access/datafile/6100444"

File IDs for all 51 jurisdictions in 2020 and 2018 are documented in the download instructions.

Cross-source overlap

For the 2022 North Carolina general election, MEDSL and NC SBE share 640 contests where both sources report results:

  • 579 (90.5%) have exactly matching vote totals
  • 47 (7.3%) match within 1%
  • 14 (2.2%) disagree by more than 1%
  • 401 (63%) have different candidate name formatting between the two sources

This overlap is the basis for our entity resolution validation. See Cross-Source Reconciliation.