CLI Reference

The election-aggregation binary provides a command-line interface for pipeline execution and data source management. Commands are not yet implemented — this chapter documents the planned interface.

Planned Commands

Command	Pipeline stage	Description
`election-aggregation process`	L0 → L1	Parse raw source files into cleaned JSONL records
`election-aggregation embed`	L1 → L2	Generate text-embedding-3-large vectors for candidate names, contest names, and jurisdictions
`election-aggregation match`	L2 → L3	Run entity resolution: exact → Jaro-Winkler → embedding → LLM confirmation
`election-aggregation canonicalize`	L3 → L4	Assign canonical names, build temporal chains, produce verification status
`election-aggregation verify`	L4	Walk the hash chain from L4 back to L0 source bytes and report any breaks
`election-aggregation sources`	—	List all data sources with download URLs and instructions

Common Options

All pipeline commands will accept:

--state <STATE> — Process a single state (two-letter postal code). Without this flag, all states are processed.
--year <YEAR> — Process a single election year. Without this flag, all loaded years are processed.
--data-dir <PATH> — Root directory for source files and pipeline output. Defaults to ./local-data.
--jobs <N> — Number of parallel state/year partitions to process. Defaults to 1.

API Key Configuration

L2 (embed) requires an OpenAI API key for text-embedding-3-large. L3 (match) requires an Anthropic API key for Claude Sonnet confirmation calls. Keys are read from environment variables:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

The process and canonicalize commands do not call external APIs.

Implementation Status

The binary currently prints a version banner and documentation pointer. No subcommands are wired up. The CLI will use clap for argument parsing once pipeline modules are functional.

Keyboard shortcuts

Election Aggregation

CLI Reference

Planned Commands

Common Options

API Key Configuration

Implementation Status