Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

CLI Reference

The election-aggregation binary provides a command-line interface for pipeline execution and data source management. Commands are not yet implemented — this chapter documents the planned interface.

Planned Commands

CommandPipeline stageDescription
election-aggregation processL0 → L1Parse raw source files into cleaned JSONL records
election-aggregation embedL1 → L2Generate text-embedding-3-large vectors for candidate names, contest names, and jurisdictions
election-aggregation matchL2 → L3Run entity resolution: exact → Jaro-Winkler → embedding → LLM confirmation
election-aggregation canonicalizeL3 → L4Assign canonical names, build temporal chains, produce verification status
election-aggregation verifyL4Walk the hash chain from L4 back to L0 source bytes and report any breaks
election-aggregation sourcesList all data sources with download URLs and instructions

Common Options

All pipeline commands will accept:

  • --state <STATE> — Process a single state (two-letter postal code). Without this flag, all states are processed.
  • --year <YEAR> — Process a single election year. Without this flag, all loaded years are processed.
  • --data-dir <PATH> — Root directory for source files and pipeline output. Defaults to ./local-data.
  • --jobs <N> — Number of parallel state/year partitions to process. Defaults to 1.

API Key Configuration

L2 (embed) requires an OpenAI API key for text-embedding-3-large. L3 (match) requires an Anthropic API key for Claude Sonnet confirmation calls. Keys are read from environment variables:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

The process and canonicalize commands do not call external APIs.

Implementation Status

The binary currently prints a version banner and documentation pointer. No subcommands are wired up. The CLI will use clap for argument parsing once pipeline modules are functional.