Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

CLI Command Reference

This is the complete reference for every congress-approp command and flag. For tutorials and worked examples, see the Tutorials section. For task-oriented guides, see How-To Guides.

Global Options

These flags can be used with any command:

FlagShortDescription
--verbose-vEnable verbose (debug-level) logging. Shows detailed progress, file paths, and internal state.
--help-hPrint help for the command
--version-VPrint version (top-level only)

summary

Show a per-bill overview of all extracted data: provision counts, budget authority, rescissions, and net budget authority.

congress-approp summary [OPTIONS]
FlagTypeDefaultDescription
--dirpath./dataData directory containing extracted bills. Try data for included FY2019–FY2026 dataset. Walks recursively to find all extraction.json files.
--formatstringtableOutput format: table, json, jsonl, csv
--by-agencyflagAppend a second table showing budget authority totals by parent department, sorted descending
--fyintegerFilter to bills covering this fiscal year (e.g., 2026). Uses bill.fiscal_years from extraction data — works without enrich.
--subcommitteestringFilter by subcommittee jurisdiction (e.g., defense, thud, cjs). Requires bill_meta.json — run enrich first. See Enrich Bills with Metadata for valid slugs.

Examples

# FY2026 bills only
congress-approp summary --dir data --fy 2026

# FY2026 THUD subcommittee only (requires enrich)
congress-approp summary --dir data --fy 2026 --subcommittee thud
# Basic summary of included example data
congress-approp summary --dir data

# JSON output for scripting
congress-approp summary --dir data --format json

# Show department-level rollup
congress-approp summary --dir data --by-agency

# CSV for spreadsheet import
congress-approp summary --dir data --format csv > bill_summary.csv

Output

The summary table shows one row per loaded bill plus a TOTAL row:

┌───────────┬───────────────────────┬────────────┬─────────────────┬─────────────────┬─────────────────┐
│ Bill      ┆ Classification        ┆ Provisions ┆ Budget Auth ($) ┆ Rescissions ($) ┆      Net BA ($) │
╞═══════════╪═══════════════════════╪════════════╪═════════════════╪═════════════════╪═════════════════╡
│ H.R. 4366 ┆ Omnibus               ┆       2364 ┆ 846,137,099,554 ┆  24,659,349,709 ┆ 821,477,749,845 │
│ H.R. 5860 ┆ Continuing Resolution ┆        130 ┆  16,000,000,000 ┆               0 ┆  16,000,000,000 │
│ H.R. 9468 ┆ Supplemental          ┆          7 ┆   2,882,482,000 ┆               0 ┆   2,882,482,000 │
│ TOTAL     ┆                       ┆       2501 ┆ 865,019,581,554 ┆  24,659,349,709 ┆ 840,360,231,845 │
└───────────┴───────────────────────┴────────────┴─────────────────┴─────────────────┴─────────────────┘

0 dollar amounts unverified across all bills. Run `congress-approp audit` for detailed verification.

Budget Authority is computed from provisions (not from any LLM-generated summary). See Budget Authority Calculation for the formula.

The --by-agency flag appends a second table with columns: Department, Budget Auth ($), Rescissions ($), Provisions.


Search provisions across all extracted bills. Supports filtering by type, agency, account, keyword, division, dollar range, and meaning-based semantic search.

congress-approp search [OPTIONS]

Filter Flags

FlagShortTypeDescription
--dirpathData directory containing extracted bills. Default: ./data
--type-tstringFilter by provision type. Use --list-types to see valid values.
--agency-astringFilter by agency name (case-insensitive substring match)
--accountstringFilter by account name (case-insensitive substring match)
--keyword-kstringSearch in raw_text field (case-insensitive substring match)
--billstringFilter to a specific bill identifier (e.g., "H.R. 4366")
--divisionstringFilter by division letter (e.g., A, B, C)
--min-dollarsintegerMinimum dollar amount (absolute value)
--max-dollarsintegerMaximum dollar amount (absolute value)
--fyintegerFilter to bills covering this fiscal year (e.g., 2026). Works without enrich.
--subcommitteestringFilter by subcommittee jurisdiction (e.g., thud, defense). Requires enrich.

All filters use AND logic — every provision in the result must match every specified filter. Filter order on the command line has no effect on results.

Semantic Search Flags

FlagTypeDescription
--semanticstringRank results by meaning similarity to this query text. Requires pre-computed embeddings and OPENAI_API_KEY.
--similarstringFind provisions similar to the one specified. Format: <bill_directory>:<provision_index> (e.g., 118-hr9468:0). Uses stored vectors — no API call needed.
--topintegerMaximum number of results for --semantic or --similar searches. Default: 20. Has no effect on non-semantic searches (which return all matching provisions).

Output Flags

FlagTypeDefaultDescription
--formatstringtableOutput format: table, json, jsonl, csv
--list-typesflagPrint all valid provision types and exit (ignores other flags)

Examples

# All appropriations across all example bills
congress-approp search --dir data --type appropriation

# VA appropriations over $1 billion in Division A
congress-approp search --dir data --type appropriation --agency "Veterans" --division A --min-dollars 1000000000

# FEMA-related provisions by keyword
congress-approp search --dir data --keyword "Federal Emergency Management"

# CR substitutions (table auto-adapts to show New/Old/Delta columns)
congress-approp search --dir data/118-hr5860 --type cr_substitution

# All directives in the VA supplemental
congress-approp search --dir data/118-hr9468 --type directive

# Semantic search — find by meaning, not keywords
congress-approp search --dir data --semantic "school lunch programs for kids" --top 5

# Find provisions similar to a specific one across all bills
congress-approp search --dir data --similar 118-hr9468:0 --top 5

# Combine semantic with hard filters
congress-approp search --dir data --semantic "clean energy" --type appropriation --min-dollars 100000000 --top 10

# Export to CSV for spreadsheet analysis
congress-approp search --dir data --type appropriation --format csv > appropriations.csv

# Export to JSON for programmatic use
congress-approp search --dir data --type rescission --format json

# List all valid provision types
congress-approp search --dir data --list-types

Available Provision Types

  appropriation                    Budget authority grant
  rescission                       Cancellation of prior budget authority
  cr_substitution                  CR anomaly (substituting $X for $Y)
  transfer_authority               Permission to move funds between accounts
  limitation                       Cap or prohibition on spending
  directed_spending                Earmark / community project funding
  mandatory_spending_extension     Amendment to authorizing statute
  directive                        Reporting requirement or instruction
  rider                            Policy provision (no direct spending)
  continuing_resolution_baseline   Core CR funding mechanism
  other                            Unclassified provisions

Table Output Columns

The table adapts its shape based on the provision types in the results.

Standard search table:

ColumnDescription
$Verification status: (found unique), (found multiple), (not found), blank (no dollar amount)
BillBill identifier
TypeProvision type
Description / AccountAccount name for appropriations/rescissions, description for other types
Amount ($)Dollar amount, or for provisions without amounts
SectionSection reference from the bill (e.g., SEC. 101)
DivDivision letter for omnibus bills

CR substitution table: Replaces Amount ($) with New ($), Old ($), and Delta ($).

Semantic/similar table: Adds a Sim column at the left showing cosine similarity (0.0–1.0).

JSON/CSV Output Fields

JSON and CSV output include more fields than the table:

FieldTypeDescription
billstringBill identifier
provision_typestringProvision type
account_namestringAccount name
descriptionstringDescription
agencystringAgency name
dollarsinteger or nullDollar amount
old_dollarsinteger or nullOld amount (CR substitutions only)
semanticsstringAmount semantics (e.g., new_budget_authority)
sectionstringSection reference
divisionstringDivision letter
raw_textstringBill text excerpt
amount_statusstring or nullfound, found_multiple, not_found, or null
match_tierstringexact, normalized, spaceless, no_match
qualitystringstrong, moderate, weak, or n/a
provision_indexintegerIndex in the bill’s provision array (zero-based)

compare

Compare provisions between two sets of bills. Matches accounts by (agency, account_name) and computes dollar deltas. Account names are matched case-insensitively with em-dash prefix stripping. If a dataset.json file exists in the data directory, agency groups and account aliases are applied for cross-bill matching. Use --exact to disable all normalization and match on exact lowercased strings only. See Resolve Agency and Account Name Differences for details.

There are two ways to specify what to compare:

Directory-based (compare two specific directories):

congress-approp compare --base <BASE> --current <CURRENT> [OPTIONS]

FY-based (compare all bills for one fiscal year against another):

congress-approp compare --base-fy <YEAR> --current-fy <YEAR> --dir <DIR> [OPTIONS]
FlagShortTypeDefaultDescription
--basepathBase directory for comparison (e.g., prior fiscal year)
--currentpathCurrent directory for comparison (e.g., current fiscal year)
--base-fyintegerUse all bills covering this FY as the base set (alternative to --base)
--current-fyintegerUse all bills covering this FY as the current set (alternative to --current)
--dirpath./dataData directory (required with --base-fy/--current-fy)
--subcommitteestringScope comparison to one subcommittee jurisdiction. Requires enrich.
--agency-astringFilter by agency name (case-insensitive substring)
--realflagAdd inflation-adjusted “Real Δ %*” column using CPI-U. Shows which programs beat inflation (▲) and which fell behind (▼).
--cpi-filepathPath to a custom CPI/deflator JSON file. Overrides the bundled CPI-U data. See Adjust for Inflation for the file format.
--formatstringtableOutput format: table, json, csv

You must provide either --base + --current (directory paths) or --base-fy + --current-fy + --dir.

Examples

# Compare omnibus to supplemental (directory-based)
congress-approp compare --base data/118-hr4366 --current data/118-hr9468

# Compare THUD funding: FY2024 → FY2026 (FY-based with subcommittee scope)
congress-approp compare --base-fy 2024 --current-fy 2026 --subcommittee thud --dir data

# Compare all FY2024 vs FY2026 (no subcommittee scope)
congress-approp compare --base-fy 2024 --current-fy 2026 --dir data

# Show inflation-adjusted changes (which programs beat inflation?)
congress-approp compare --base-fy 2024 --current-fy 2026 --subcommittee thud --dir data --real

# Filter to VA accounts only
congress-approp compare --base data/118-hr4366 --current data/118-hr9468 --agency "Veterans"

# Export comparison to CSV
congress-approp compare --base-fy 2024 --current-fy 2026 --subcommittee thud --dir data --format csv > thud_compare.csv

Matching Behavior

Account matching uses several normalization layers:

  • Case-insensitive: “Grants-In-Aid for Airports” matches “Grants-in-Aid for Airports”
  • Em-dash prefix stripping: “Department of VA—Compensation and Pensions” matches “Compensation and Pensions”
  • Sub-agency normalization: “Maritime Administration” matches “Department of Transportation” for the same account name
  • Hierarchical CR name matching: “Federal Emergency Management Agency—Disaster Relief Fund” matches “Disaster Relief Fund”

Output Columns

ColumnDescription
AccountAccount name, matched between bills
AgencyParent department or agency
Base ($)Budget authority in the --base or --base-fy bills
Current ($)Budget authority in the --current or --current-fy bills
Delta ($)Current minus Base
Δ %Percentage change
Statuschanged, unchanged, only in base, or only in current

Results are sorted by absolute delta, largest changes first. The tool warns when comparing different bill classifications (e.g., Omnibus vs. Supplemental).


audit

Show a detailed verification and quality report for all extracted bills.

congress-approp audit [OPTIONS]
FlagTypeDefaultDescription
--dirpath./dataData directory to audit. Try data for included FY2019–FY2026 dataset.
--verboseflagShow individual problematic provisions (those with not_found amounts or no_match raw text)

Examples

# Standard audit
congress-approp audit --dir data

# Verbose — see individual problematic provisions
congress-approp audit --dir data --verbose

Output

┌───────────┬────────────┬──────────┬──────────┬───────┬───────┬──────────┬───────────┬──────────┬──────────┐
│ Bill      ┆ Provisions ┆ Verified ┆ NotFound ┆ Ambig ┆ Exact ┆ NormText ┆ Spaceless ┆ TextMiss ┆ Coverage │
╞═══════════╪════════════╪══════════╪══════════╪═══════╪═══════╪══════════╪═══════════╪══════════╪══════════╡
│ H.R. 4366 ┆       2364 ┆      762 ┆        0 ┆   723 ┆  2285 ┆       59 ┆         0 ┆       20 ┆    94.2% │
│ H.R. 5860 ┆        130 ┆       33 ┆        0 ┆     2 ┆   102 ┆       12 ┆         0 ┆       16 ┆    61.1% │
│ H.R. 9468 ┆          7 ┆        2 ┆        0 ┆     0 ┆     5 ┆        0 ┆         0 ┆        2 ┆   100.0% │
│ TOTAL     ┆       2501 ┆      797 ┆        0 ┆   725 ┆  2392 ┆       71 ┆         0 ┆       38 ┆          │
└───────────┴────────────┴──────────┴──────────┴───────┴───────┴──────────┴───────────┴──────────┴──────────┘

Column Reference

Amount verification (left side):

ColumnDescription
VerifiedDollar amount found at exactly one position in source text
NotFoundDollar amount NOT found in source — should be 0; review manually if > 0
AmbigDollar amount found at multiple positions — correct but location is uncertain

Raw text verification (right side):

ColumnDescription
Exactraw_text is byte-identical substring of source text
NormTextraw_text matches after whitespace/quote/dash normalization
Spacelessraw_text matches only after removing all spaces
TextMissraw_text not found at any tier — may be paraphrased or truncated

Completeness:

ColumnDescription
CoveragePercentage of dollar strings in source text matched to a provision. See What Coverage Means.

See Understanding the Output and Verify Extraction Accuracy for detailed interpretation guidance.


download

Download appropriations bill XML from Congress.gov.

congress-approp download [OPTIONS] --congress <CONGRESS>
FlagTypeDefaultDescription
--congressinteger(required)Congress number (e.g., 118 for 2023–2024)
--typestringBill type code: hr, s, hjres, sjres
--numberintegerBill number (used with --type for single-bill download)
--output-dirpath./dataOutput directory. Intermediate directories are created as needed.
--enacted-onlyflagOnly download bills signed into law
--formatstringxmlDownload format: xml (for extraction), pdf (for reading). Comma-separated for multiple.
--versionstringText version filter: enr (enrolled/final), ih (introduced), eh (engrossed). When omitted, only enrolled is downloaded.
--all-versionsflagDownload all text versions (introduced, engrossed, enrolled, etc.) instead of just enrolled
--dry-runflagShow what would be downloaded without fetching

Requires: CONGRESS_API_KEY environment variable.

Examples

# Download a specific bill (enrolled version only, by default)
congress-approp download --congress 118 --type hr --number 4366 --output-dir data

# Download all enacted bills for a congress (enrolled versions only)
congress-approp download --congress 118 --enacted-only --output-dir data

# Preview without downloading
congress-approp download --congress 118 --enacted-only --output-dir data --dry-run

# Download both XML and PDF
congress-approp download --congress 118 --type hr --number 4366 --output-dir data --format xml,pdf

# Download all text versions (introduced, engrossed, enrolled, etc.)
congress-approp download --congress 118 --type hr --number 4366 --output-dir data --all-versions

extract

Extract spending provisions from bill XML using Claude. Parses the XML, sends text chunks to the LLM in parallel, merges results, and runs deterministic verification.

congress-approp extract [OPTIONS]
FlagTypeDefaultDescription
--dirpath./dataData directory containing downloaded bill XML
--dry-runflagShow chunk count and estimated tokens without calling the LLM
--parallelinteger5Number of concurrent LLM API calls. Higher is faster but uses more API quota.
--modelstringclaude-opus-4-6LLM model for extraction. Can also be set via APPROP_MODEL env var. Flag takes precedence.
--forceflagRe-extract bills even if extraction.json already exists. Without this flag, already-extracted bills are skipped.
--continue-on-errorflagSave partial results when some chunks fail. Without this flag, the tool aborts a bill if any chunk permanently fails and does not write extraction.json.

Requires: ANTHROPIC_API_KEY environment variable (not required if all bills are already extracted).

Behavior notes:

  • Aborts on chunk failure by default. If any chunk permanently fails (after all retries), the bill’s extraction is aborted and no extraction.json is written. This prevents garbage partial extractions from being saved to disk. Use --continue-on-error to save partial results instead.
  • Per-bill error handling. In a multi-bill run, a failure on one bill does not abort the entire run. The failed bill is skipped (no files written) and extraction continues with the remaining bills. Re-running the same command retries only the failed bills.
  • Skips already-extracted bills by default. If every bill in --dir already has extraction.json, the command exits without requiring an API key. Use --force to re-extract.
  • Prefers enrolled XML. When a directory has multiple BILLS-*.xml files, only the enrolled version (*enr.xml) is processed. Non-enrolled versions are ignored.
  • Resilient to parse failures. If an XML file fails to parse (e.g., a non-enrolled version with a different structure), the tool logs a warning and continues to the next bill instead of aborting.

Examples

# Preview extraction (no API calls)
congress-approp extract --dir data/118/hr/9468 --dry-run

# Extract a single bill
congress-approp extract --dir data/118/hr/9468

# Extract with higher parallelism for large bills
congress-approp extract --dir data/118/hr/4366 --parallel 8

# Extract all bills under a directory (skips already-extracted bills)
congress-approp extract --dir data --parallel 6

# Re-extract a bill that was already processed
congress-approp extract --dir data/118/hr/9468 --force

# Save partial results even when some chunks fail (rate limiting, etc.)
congress-approp extract --dir data/118/hr/2882 --parallel 6 --continue-on-error

# Use a different model
congress-approp extract --dir data/118/hr/9468 --model claude-sonnet-4-20250514

Output Files

FileDescription
extraction.jsonAll provisions with structured fields
verification.jsonDeterministic verification against source text
metadata.jsonModel, prompt version, timestamps, source XML hash
tokens.jsonToken usage (input, output, cache)
chunks/Per-chunk LLM artifacts (gitignored)

embed

Generate semantic embedding vectors for extracted provisions using OpenAI’s embedding model. Enables --semantic and --similar on the search command.

congress-approp embed [OPTIONS]
FlagTypeDefaultDescription
--dirpath./dataData directory containing extracted bills
--modelstringtext-embedding-3-largeOpenAI embedding model
--dimensionsinteger3072Number of dimensions to request from the API
--batch-sizeinteger100Provisions per API batch call
--dry-runflagPreview token counts without calling the API

Requires: OPENAI_API_KEY environment variable.

Bills with up-to-date embeddings are automatically skipped (detected via hash chain).

Examples

# Generate embeddings for all bills
congress-approp embed --dir data

# Preview without calling API
congress-approp embed --dir data --dry-run

# Generate for a single bill
congress-approp embed --dir data/118/hr/9468

# Use fewer dimensions (not recommended — see Generate Embeddings guide)
congress-approp embed --dir data --dimensions 1024

Output Files

FileDescription
embeddings.jsonMetadata: model, dimensions, count, SHA-256 hashes
vectors.binRaw little-endian float32 vectors (count × dimensions × 4 bytes)

enrich

Generate bill metadata for fiscal year filtering, subcommittee scoping, and advance appropriation classification. This command parses the source XML and analyzes the extraction output — no API keys are required.

congress-approp enrich [OPTIONS]
FlagTypeDefaultDescription
--dirpath./dataData directory containing extracted bills
--dry-runflagPreview what would be generated without writing files
--forceflagRe-enrich even if bill_meta.json already exists

What It Generates

For each bill directory, enrich creates a bill_meta.json file containing:

  • Congress number — parsed from the XML filename
  • Subcommittee mappings — division letter → jurisdiction (e.g., Division A → Defense)
  • Bill nature — enriched classification (omnibus, minibus, full-year CR with appropriations, etc.)
  • Advance appropriation classification — each budget authority provision classified as current-year, advance, or supplemental using a fiscal-year-aware algorithm
  • Canonical account names — case-normalized, prefix-stripped names for cross-bill matching

Examples

# Enrich all bills
congress-approp enrich --dir data

# Preview without writing files
congress-approp enrich --dir data --dry-run

# Force re-enrichment
congress-approp enrich --dir data --force

When to Run

Run enrich once after extracting bills, before using --subcommittee filters. The --fy flag on other commands works without enrich (it uses fiscal year data already in extraction.json), but --subcommittee requires the division-to-jurisdiction mapping that only enrich provides.

The tool warns when bill_meta.json is stale (when extraction.json has changed since enrichment). Run enrich --force to regenerate.

See Enrich Bills with Metadata for a detailed guide including subcommittee slugs, advance classification algorithm, and provenance tracking.


verify-text

Check that every provision’s raw_text is a verbatim substring of the enrolled bill source text. Optionally repair mismatches and add source_span byte positions. No API key required.

congress-approp verify-text [OPTIONS]
  --dir <DIR>       Data directory [default: ./data]
  --repair          Fix broken raw_text and add source_span to every provision
  --bill <BILL>     Single bill directory (e.g., 118-hr2882)
  --format <FMT>    Output format: table, json [default: table]

Examples

# Analyze all bills (no changes)
congress-approp verify-text --dir data

# Repair and add source spans
congress-approp verify-text --dir data --repair

# Single bill
congress-approp verify-text --dir data --bill 118-hr2882 --repair

Output

Reports the number of provisions at each match tier:

34568 provisions: 34568 exact, 0 repaired (0 prefix, 0 substring, 0 normalized), 0 unverified
Traceable: 34568/34568 (100.000%)

✅ Every provision is traceable to the enrolled bill source text.

When --repair is used, a backup is created at extraction.json.pre-repair before any modifications. Each provision gets a source_span field with UTF-8 byte offsets into the source .txt file.

See Verifying Extraction Data for details on the 3-tier repair algorithm and the source span invariant.


resolve-tas

Map each top-level budget authority provision to a Federal Account Symbol (FAS) code from the Treasury’s FAST Book. Uses deterministic string matching for unambiguous names and Claude Opus for the rest.

congress-approp resolve-tas [OPTIONS]
  --dir <DIR>              Data directory [default: ./data]
  --bill <BILL>            Single bill directory (e.g., 118-hr2882)
  --dry-run                Show what would be resolved and estimated cost
  --no-llm                 Deterministic matching only (no API key needed)
  --force                  Re-resolve even if tas_mapping.json exists
  --batch-size <N>         Provisions per LLM batch [default: 40]
  --fas-reference <PATH>   Path to FAS reference JSON [default: data/fas_reference.json]

Requires ANTHROPIC_API_KEY for the LLM tier. With --no-llm, no API key is needed (resolves ~56% of provisions).

Examples

# Preview cost before running
congress-approp resolve-tas --dir data --dry-run

# Full resolution (deterministic + LLM)
congress-approp resolve-tas --dir data

# Free mode (deterministic only, no API key)
congress-approp resolve-tas --dir data --no-llm

# Single bill
congress-approp resolve-tas --dir data --bill 118-hr2882

Output

Produces tas_mapping.json per bill with one mapping per top-level budget authority provision. Reports match rates:

6685 provisions: 6645 matched (99.4%), 40 unmatched
  Deterministic: 3731, LLM: 2914

See Resolving Treasury Account Symbols for details on the two-tier matching algorithm, confidence levels, and the FAST Book reference.


authority build

Aggregate all tas_mapping.json files into a single authorities.json account registry at the data root. Groups provisions by FAS code, collects name variants, and detects rename events.

congress-approp authority build [OPTIONS]
  --dir <DIR>       Data directory [default: ./data]
  --force           Rebuild even if authorities.json already exists

No API key required. Runs in ~1 second.

Example

congress-approp authority build --dir data

# Output:
# Built authorities.json:
#   1051 authorities, 6645 provisions, 24 bills, FYs [2019, 2020, ..., 2026]
#   937 in multiple bills, 443 with name variants

authority list

Browse the account authority registry. Shows FAS code, bill count, fiscal years, total budget authority, and official title for each authority.

congress-approp authority list [OPTIONS]
  --dir <DIR>       Data directory [default: ./data]
  --agency <CODE>   Filter by CGAC agency code (e.g., 070 for DHS)
  --format <FMT>    Output format: table, json [default: table]

Examples

# List all authorities
congress-approp authority list --dir data

# Filter to DHS accounts
congress-approp authority list --dir data --agency 070

# JSON for programmatic use
congress-approp authority list --dir data --format json

trace

Show the funding timeline for a federal budget account across all fiscal years in the dataset. Accepts a FAS code or a name search query.

congress-approp trace <QUERY> [OPTIONS]
  <QUERY>           FAS code (e.g., 070-0400) or account name fragment
  --dir <DIR>       Data directory [default: ./data]
  --format <FMT>    Output format: table, json [default: table]

Name search splits the query into words and matches authorities where all words appear across the title, agency name, FAS code, and name variants. If multiple authorities match, the command lists candidates and asks you to be more specific.

Examples

# By FAS code (exact)
congress-approp trace 070-0400 --dir data

# By name (word-level search)
congress-approp trace "coast guard operations" --dir data
congress-approp trace "disaster relief" --dir data

# JSON output
congress-approp trace 070-0400 --dir data --format json

Output

TAS 070-0400: Operations and Support, United States Secret Service, Homeland Security
  Agency: Department of Homeland Security

┌──────┬──────────────────────┬────────────────┬──────────────────────────────┐
│ FY   ┆ Budget Authority ($) ┆ Bill(s)        ┆ Account Name(s)              │
╞══════╪══════════════════════╪════════════════╪══════════════════════════════╡
│ 2020 ┆        2,336,401,000 ┆ H.R. 1158      ┆ United States Secret Servi…  │
│ 2021 ┆        2,373,109,000 ┆ H.R. 133       ┆ United States Secret Servi…  │
│ 2022 ┆        2,554,729,000 ┆ H.R. 2471      ┆ Operations and Support       │
│ 2024 ┆        3,007,982,000 ┆ H.R. 2882      ┆ Operations and Support       │
│ 2025 ┆          231,000,000 ┆ H.R. 9747 (CR) ┆ United States Secret Servi…  │
└──────┴──────────────────────┴────────────────┴──────────────────────────────┘

Bill classification labels — (CR), (supplemental), (full-year CR) — are shown when the bill is not a regular or omnibus appropriation. Detected rename events are shown below the timeline. Name variants are listed with their classification type.

See The Authority System for details on how account tracking works across fiscal years.


normalize suggest-text-match

Discover agency and account naming variants using orphan-pair analysis and structural regex patterns. Scans all bills for cross-FY orphan pairs (same account name, different agency) and common naming patterns (prefix expansion, preposition variants, abbreviation differences). Results are cached for the normalize accept command.

No API calls. No network access. Runs in milliseconds.

congress-approp normalize suggest-text-match [OPTIONS]
  --dir <DIR>            Data directory [default: ./data]
  --format <FORMAT>      Output format: table, json, hashes [default: table]
  --min-accounts <N>     Minimum shared accounts to include a suggestion [default: 1]

Use --format hashes to output one hash per line for scripting. Use --min-accounts 3 to filter to stronger suggestions (pairs sharing 3+ account names).

Suggestions are cached in ~/.congress-approp/cache/ and consumed by normalize accept.


normalize suggest-llm

Discover agency and account naming variants using LLM classification with XML heading context. Sends unresolved ambiguous account clusters to Claude with the bill’s XML organizational structure, dollar amounts, and fiscal year information. The LLM classifies agency pairs as SAME or DIFFERENT.

Requires ANTHROPIC_API_KEY. Uses Claude Opus.

congress-approp normalize suggest-llm [OPTIONS]
  --dir <DIR>            Data directory [default: ./data]
  --batch-size <N>       Maximum clusters per API call [default: 15]
  --format <FORMAT>      Output format: table, json, hashes [default: table]

Only processes clusters not already resolved by suggest-text-match or existing dataset.json entries. Results are cached for the normalize accept command.


normalize accept

Accept suggested normalizations by hash. Reads from the suggestion cache populated by suggest-text-match or suggest-llm, matches the specified hashes, and writes the accepted groups to dataset.json.

congress-approp normalize accept [OPTIONS] [HASHES]...
  --dir <DIR>            Data directory [default: ./data]
  --auto                 Accept all cached suggestions without specifying hashes

If no cache exists, prints an error suggesting to run suggest-text-match first.


normalize list

Display current entity resolution rules from dataset.json.

congress-approp normalize list [OPTIONS]
  --dir <DIR>            Data directory [default: ./data]

Shows all agency groups and account aliases. If no dataset.json exists, shows a helpful message suggesting how to create one.


relate

Deep-dive on one provision across all bills. Finds similar provisions by embedding similarity, groups them by confidence tier, and optionally builds a fiscal year timeline with advance/current/supplemental split. Requires pre-computed embeddings but no API keys (uses stored vectors).

congress-approp relate <SOURCE> [OPTIONS]

The <SOURCE> argument is a provision reference in the format bill_directory:index (e.g., 118-hr9468:0). Use the provision_index from search output.

FlagTypeDefaultDescription
--dirpath./dataData directory
--topinteger10Max related provisions per confidence tier
--formatstringtableOutput format: table, json, hashes
--fy-timelineflagShow fiscal year timeline with advance/current/supplemental split

Output

The table output shows two sections:

  • Same Account — high-confidence matches (verified name match or high similarity + same agency). Each row includes a deterministic 8-char hash, similarity score, bill, account name, dollar amount, funding timing, and confidence label.
  • Related — lower-confidence matches (uncertain zone, 0.55–0.65 similarity or name mismatch).

With --fy-timeline, a third section shows the fiscal year timeline: current-year BA, advance BA, supplemental BA, and contributing bills for each fiscal year.

Examples

# Deep-dive on VA Compensation and Pensions
congress-approp relate 118-hr9468:0 --dir data --fy-timeline

# Get just the link hashes for piping to `link accept`
congress-approp relate 118-hr9468:0 --dir data --format hashes

# JSON output with timeline
congress-approp relate 118-hr9468:0 --dir data --format json --fy-timeline

Each match includes a deterministic 8-character hex hash (e.g., b7e688d7). These hashes are computed from the source provision, target provision, and embedding model — the same inputs always produce the same hash. Use --format hashes to output just the hashes of same-account matches, suitable for piping to link accept:

congress-approp relate 118-hr9468:0 --dir data --format hashes | \
  xargs congress-approp link accept --dir data

Compute cross-bill link candidates from embeddings. For each top-level budget authority provision, finds the best match in every other bill above the similarity threshold and classifies by confidence tier.

congress-approp link suggest [OPTIONS]
FlagTypeDefaultDescription
--dirpath./dataData directory
--thresholdfloat0.55Minimum similarity for candidates
--scopestringallWhich bill pairs to compare: intra (within same FY), cross (across FYs), all
--limitinteger100Max candidates to output
--formatstringtableOutput format: table, json, hashes

Confidence Tiers

Based on empirically calibrated thresholds from analysis of 6.7M pairwise comparisons:

TierCriteriaMeaning
verifiedCanonical account name match (case-insensitive, prefix-stripped)Almost certainly the same account
highSimilarity ≥ 0.65 AND same normalized agencyVery likely the same account
uncertainSimilarity 0.55–0.65, or name mismatch above 0.65Needs manual review

Examples

# Cross-fiscal-year candidates (year-over-year tracking)
congress-approp link suggest --dir data --scope cross --limit 20

# All candidates above 0.65 similarity
congress-approp link suggest --dir data --threshold 0.65 --limit 50

# Output just the hashes of new (un-accepted) candidates
congress-approp link suggest --dir data --format hashes

Persist link candidates by accepting them into links/links.json at the data root.

congress-approp link accept [OPTIONS] [HASHES...]
FlagTypeDefaultDescription
--dirpath./dataData directory
--notestringOptional annotation (e.g., “Account renamed from X to Y”)
--autoflagAccept all verified + high-confidence candidates without specifying hashes
HASHESpositionalOne or more 8-char link hashes to accept

Examples

# Accept specific links by hash
congress-approp link accept --dir data a3f7b2c4 e5d1c8a9

# Accept with a note
congress-approp link accept --dir data a3f7b2c4 --note "Same VA account, different bill vehicles"

# Auto-accept all verified and high-confidence candidates
congress-approp link accept --dir data --auto

# Pipe from relate output
congress-approp relate 118-hr9468:0 --dir data --format hashes | \
  xargs congress-approp link accept --dir data

Remove accepted links by hash.

congress-approp link remove --dir <DIR> <HASHES...>
FlagTypeDefaultDescription
--dirpath./dataData directory
HASHESpositional(required)One or more 8-char link hashes to remove

Example

congress-approp link remove --dir data a3f7b2c4

Show accepted links, optionally filtered by bill.

congress-approp link list [OPTIONS]
FlagTypeDefaultDescription
--dirpath./dataData directory
--formatstringtableOutput format: table, json
--billstringFilter to links involving this bill (case-insensitive substring)

Examples

# Show all accepted links
congress-approp link list --dir data

# Filter to links involving H.R. 4366
congress-approp link list --dir data --bill hr4366

# JSON output for programmatic use
congress-approp link list --dir data --format json

compare –use-authorities

The compare command accepts a --use-authorities flag that rescues orphan provisions by matching on FAS code instead of account name. When two provisions have the same FAS code but different names or agency attributions, they are recognized as the same account.

congress-approp compare --base-fy 2024 --current-fy 2026 \
    --subcommittee thud --dir data --use-authorities

Requires tas_mapping.json files for the bills being compared (run resolve-tas first). Orphan provisions rescued via TAS matching are labeled with their FAS code in the status column (e.g., matched (TAS 069-1775)).

This flag can be combined with --use-links, --real, and --exact. Entity resolution via dataset.json still applies unless --exact is specified.


upgrade

Upgrade extraction data to the latest schema version. Re-deserializes existing data through the current parsing logic and re-runs verification. No LLM API calls.

congress-approp upgrade [OPTIONS]
FlagTypeDefaultDescription
--dirpath./dataData directory to upgrade
--dry-runflagShow what would change without writing files

Examples

# Preview changes
congress-approp upgrade --dir data --dry-run

# Upgrade all bills
congress-approp upgrade --dir data

# Upgrade a single bill
congress-approp upgrade --dir data/118/hr/9468

api test

Test API connectivity for Congress.gov and Anthropic.

congress-approp api test

Verifies that CONGRESS_API_KEY and ANTHROPIC_API_KEY are set and that both APIs are reachable. No flags.


api bill list

List appropriations bills for a given congress.

congress-approp api bill list [OPTIONS]
FlagTypeDefaultDescription
--congressinteger(required)Congress number
--typestringFilter by bill type (hr, s, hjres, sjres)
--offsetinteger0Pagination offset
--limitinteger20Maximum results per page
--enacted-onlyflagOnly show enacted (signed into law) bills

Requires: CONGRESS_API_KEY

Examples

# All appropriations bills for the 118th Congress
congress-approp api bill list --congress 118

# Only enacted bills
congress-approp api bill list --congress 118 --enacted-only

api bill get

Get metadata for a specific bill.

congress-approp api bill get --congress <N> --type <TYPE> --number <N>
FlagTypeDescription
--congressintegerCongress number
--typestringBill type (hr, s, hjres, sjres)
--numberintegerBill number

Requires: CONGRESS_API_KEY


api bill text

Get text versions and download URLs for a bill.

congress-approp api bill text --congress <N> --type <TYPE> --number <N>
FlagTypeDescription
--congressintegerCongress number
--typestringBill type (hr, s, hjres, sjres)
--numberintegerBill number

Requires: CONGRESS_API_KEY

Lists every text version (introduced, engrossed, enrolled, etc.) with available formats (XML, PDF, HTML) and download URLs.

Example

congress-approp api bill text --congress 118 --type hr --number 4366

Common Patterns

Query pre-extracted example data (no API keys needed)

congress-approp summary --dir data
congress-approp search --dir data --type appropriation
congress-approp audit --dir data
congress-approp compare --base data/118-hr4366 --current data/118-hr9468

Full extraction pipeline

export CONGRESS_API_KEY="..."
export ANTHROPIC_API_KEY="..."
export OPENAI_API_KEY="..."

congress-approp download --congress 118 --enacted-only --output-dir data
congress-approp extract --dir data --parallel 6
congress-approp audit --dir data
congress-approp embed --dir data
congress-approp summary --dir data

Export workflows

# All appropriations to CSV
congress-approp search --dir data --type appropriation --format csv > all.csv

# JSON for jq processing
congress-approp search --dir data --format json | jq '.[].account_name' | sort -u

# JSONL for streaming
congress-approp search --dir data --format jsonl | while IFS= read -r line; do echo "$line" | jq '.dollars'; done

Environment Variables

VariableUsed ByDescription
CONGRESS_API_KEYdownload, api commandsCongress.gov API key (free signup)
ANTHROPIC_API_KEYextractAnthropic API key for Claude
OPENAI_API_KEYembed, search --semanticOpenAI API key for embeddings
APPROP_MODELextractOverride default LLM model (flag takes precedence)

See Environment Variables and API Keys for details.

Next Steps