verification.json Fields

Complete reference for every field in verification.json — the deterministic verification report produced by the extract and upgrade commands. No LLM is involved in generating this file; it is pure string matching and arithmetic against the source bill text.

Top-Level Structure

{
  "amount_checks": [ ... ],
  "raw_text_checks": [ ... ],
  "arithmetic_checks": [ ... ],
  "completeness": { ... },
  "summary": { ... }
}

Field	Type	Description
`amount_checks`	array of AmountCheck	One entry per provision with a dollar amount
`raw_text_checks`	array of RawTextCheck	One entry per provision
`arithmetic_checks`	array of ArithmeticCheck	Group-level sum verification (deprecated in newer files)
`completeness`	Completeness	Dollar amount coverage analysis
`summary`	VerificationSummary	Roll-up metrics for the entire bill

Amount Checks (`amount_checks`)

One entry for each provision that has a text_as_written dollar string. Checks whether that exact string exists in the source bill text.

Field	Type	Description
`provision_index`	integer	Index into the `provisions` array in `extraction.json` (0-based)
`text_as_written`	string	The dollar string being checked (e.g., `"$2,285,513,000"`)
`found_in_source`	boolean	Whether the string was found anywhere in the source text
`source_positions`	array of integers	Character offset(s) where the string was found. Empty if not found.
`status`	string	Verification result (see below)

Status Values

Status	Meaning	Action
`verified`	Dollar string found at exactly one position in the source text. Highest confidence — amount is real and location is unambiguous.	None needed
`ambiguous`	Dollar string found at multiple positions. Amount is correct but location is uncertain (common for round numbers like `$5,000,000`).	Acceptable — not an error
`not_found`	Dollar string not found anywhere in the source text. The LLM may have hallucinated or misformatted the amount.	Review manually — check the source XML
`mismatch`	Internal consistency check failed — the parsed `dollars` integer doesn’t match the `text_as_written` string.	Review manually — likely a parsing issue

Example

{
  "provision_index": 0,
  "text_as_written": "$2,285,513,000",
  "found_in_source": true,
  "source_positions": [431],
  "status": "verified"
}

Counts in Example Data

Bill	Verified	Ambiguous
H.R. 4366	762	723
H.R. 5860	33	2
H.R. 9468	2	0
Total	797	725

Raw Text Checks (`raw_text_checks`)

One entry per provision. Checks whether the provision’s raw_text excerpt is a substring of the source bill text, using tiered matching.

Field	Type	Description
`provision_index`	integer	Index into the `provisions` array (0-based)
`raw_text_preview`	string	First ~80 characters of the raw text being checked
`is_verbatim_substring`	boolean	True only for `exact` tier matches
`match_tier`	string	How closely the raw text matched (see below)
`found_at_position`	integer or null	Character offset if exact match; null otherwise

Match Tiers

Tier	Method	What It Handles	Count in Example Data
`exact`	Byte-identical substring match	Clean, faithful extractions	2,392 (95.6%)
`normalized`	Matches after collapsing whitespace and normalizing curly quotes (`"` → `"`) and dashes (`—` → `-`)	Unicode formatting differences from XML-to-text conversion	71 (2.8%)
`spaceless`	Matches after removing all spaces	Word-joining artifacts from XML tag stripping	0 (0.0%)
`no_match`	Not found at any tier	Paraphrased, truncated, or concatenated text from adjacent sections	38 (1.5%)

Example

{
  "provision_index": 0,
  "raw_text_preview": "For an additional amount for ''Compensation and Pensions'', $2,285,513,000, to r",
  "is_verbatim_substring": true,
  "match_tier": "exact",
  "found_at_position": 371
}

Arithmetic Checks (`arithmetic_checks`)

Group-level sum verification — checks whether line items within a section or title sum to a stated total.

Note: This field is deprecated in newer extraction files. It may be absent or empty. When present, it uses this structure:

Field	Type	Description
`scope`	string	What’s being summed (e.g., a title or division)
`extracted_sum`	integer	Sum of extracted provisions in this scope
`stated_total`	integer or null	Total stated in the bill, if any
`status`	string	`verified`, `not_found`, `mismatch`, or `no_reference`

Old files that include this field still load correctly. New extractions and upgrades omit it.

Completeness (`completeness`)

Checks whether every dollar-sign pattern in the source bill text is accounted for by at least one extracted provision.

Field	Type	Description
`total_dollar_amounts_in_text`	integer	How many dollar patterns the text index found in the source bill text
`accounted_for`	integer	How many of those patterns were matched to an extracted provision’s `text_as_written`
`unaccounted`	array of UnaccountedAmount	Dollar amounts in the bill that no provision captured

UnaccountedAmount

Each entry represents a dollar string found in the source text that wasn’t matched to any extracted provision:

Field	Type	Description
`text`	string	The dollar string (e.g., `"$500,000"`)
`value`	integer	Parsed dollar value
`position`	integer	Character offset in the source text
`context`	string	Surrounding text (~100 characters) for identification

Example

{
  "total_dollar_amounts_in_text": 2,
  "accounted_for": 2,
  "unaccounted": []
}

For a bill with unaccounted amounts:

{
  "total_dollar_amounts_in_text": 1734,
  "accounted_for": 1634,
  "unaccounted": [
    {
      "text": "$500,000",
      "value": 500000,
      "position": 45023,
      "context": "pursuant to section 502(b) of the Agricultural Credit Act, $500,000 for each State"
    }
  ]
}

The unaccounted amounts are typically statutory cross-references, loan guarantee ceilings, struck amounts in amendments, or prior-year references in CRs. See What Coverage Means (and Doesn’t) for detailed interpretation.

Coverage Calculation

Coverage = (accounted_for / total_dollar_amounts_in_text) × 100%

Bill	Total	Accounted	Coverage
H.R. 4366	~1,734	~1,634	94.2%
H.R. 5860	~36	~22	61.1%
H.R. 9468	2	2	100.0%

Verification Summary (`summary`)

Roll-up metrics for the entire bill — these are the numbers displayed by the audit command.

Field	Type	Description
`total_provisions`	integer	Total provisions checked
`amounts_verified`	integer	Provisions whose dollar amount was found at exactly one position
`amounts_not_found`	integer	Provisions whose dollar amount was NOT found in source text
`amounts_ambiguous`	integer	Provisions whose dollar amount appeared at multiple positions
`raw_text_exact`	integer	Provisions with exact (byte-identical) raw text match
`raw_text_normalized`	integer	Provisions with normalized match
`raw_text_spaceless`	integer	Provisions with spaceless match
`raw_text_no_match`	integer	Provisions with no raw text match at any tier
`completeness_pct`	float	Percentage of source dollar amounts accounted for (100.0 = all captured)
`provisions_by_detail_level`	object	Count of provisions at each detail level (e.g., `{"top_level": 483, "sub_allocation": 396}`)

Example (H.R. 9468)

{
  "total_provisions": 7,
  "amounts_verified": 2,
  "amounts_not_found": 0,
  "amounts_ambiguous": 0,
  "raw_text_exact": 5,
  "raw_text_normalized": 0,
  "raw_text_spaceless": 0,
  "raw_text_no_match": 2,
  "completeness_pct": 100.0,
  "provisions_by_detail_level": {
    "top_level": 2
  }
}

Mapping to Audit Table Columns

Audit Column	Summary Field
Provisions	`total_provisions`
Verified	`amounts_verified`
NotFound	`amounts_not_found`
Ambig	`amounts_ambiguous`
Exact	`raw_text_exact`
NormText	`raw_text_normalized`
Spaceless	`raw_text_spaceless`
TextMiss	`raw_text_no_match`
Coverage	`completeness_pct`

How verification.json Is Used

By the `audit` command

The audit command reads verification.json for each bill and renders the summary metrics as the audit table.

By the `search` command

Search uses verification data to populate these output fields:

Search Output Field	Source in verification.json
`amount_status`	`amount_checks[i].status` — mapped to `"found"`, `"found_multiple"`, or `"not_found"`
`match_tier`	`raw_text_checks[i].match_tier` — `"exact"`, `"normalized"`, `"spaceless"`, or `"no_match"`
`quality`	Derived from both: `"strong"` if amount verified + text exact; `"moderate"` if either is imperfect; `"weak"` if amount not found; `"n/a"` for provisions without dollar amounts

By the `summary` command

The summary footer (“0 dollar amounts unverified across all bills”) counts the total amounts_not_found across all loaded bills.

When verification.json Is Generated

By extract: Automatically after LLM extraction completes. Verification runs against the source XML with no LLM involvement.
By upgrade: Re-generated when upgrading extraction data to a new schema version. The source XML must be present in the bill directory for verification to run.

If the source XML (BILLS-*.xml) is not present, verification is skipped and verification.json is not created or updated.

import json

with open("data/118-hr9468/verification.json") as f:
    v = json.load(f)

# Summary metrics
print(f"Not found: {v['summary']['amounts_not_found']}")
print(f"Coverage: {v['summary']['completeness_pct']:.1f}%")
print(f"Exact text matches: {v['summary']['raw_text_exact']}")

# Check individual provisions
for check in v["amount_checks"]:
    if check["status"] == "not_found":
        print(f"WARNING: Provision {check['provision_index']}: {check['text_as_written']} not found in source")

# See unaccounted dollar amounts
for ua in v["completeness"]["unaccounted"]:
    print(f"Unaccounted: {ua['text']} at position {ua['position']}")
    print(f"  Context: {ua['context']}")

How Verification Works — detailed explanation of the three verification checks
What Coverage Means (and Doesn’t) — interpreting the completeness metric
Verify Extraction Accuracy — practical guide for running and interpreting the audit
extraction.json Fields — the extraction data that verification checks against

Congressional Appropriations Analyzer

verification.json Fields

Top-Level Structure

Amount Checks (`amount_checks`)

Status Values

Example

Counts in Example Data

Raw Text Checks (`raw_text_checks`)

Match Tiers

Example

Arithmetic Checks (`arithmetic_checks`)

Completeness (`completeness`)

UnaccountedAmount

Example

Coverage Calculation

Verification Summary (`summary`)

Example (H.R. 9468)

Mapping to Audit Table Columns

How verification.json Is Used

By the `audit` command

By the `search` command

By the `summary` command

When verification.json Is Generated

Accessing verification.json

From the CLI

From Python

Keyboard shortcuts

Congressional Appropriations Analyzer