Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

verification.json Fields

Complete reference for every field in verification.json — the deterministic verification report produced by the extract and upgrade commands. No LLM is involved in generating this file; it is pure string matching and arithmetic against the source bill text.

Top-Level Structure

{
  "amount_checks": [ ... ],
  "raw_text_checks": [ ... ],
  "arithmetic_checks": [ ... ],
  "completeness": { ... },
  "summary": { ... }
}
FieldTypeDescription
amount_checksarray of AmountCheckOne entry per provision with a dollar amount
raw_text_checksarray of RawTextCheckOne entry per provision
arithmetic_checksarray of ArithmeticCheckGroup-level sum verification (deprecated in newer files)
completenessCompletenessDollar amount coverage analysis
summaryVerificationSummaryRoll-up metrics for the entire bill

Amount Checks (amount_checks)

One entry for each provision that has a text_as_written dollar string. Checks whether that exact string exists in the source bill text.

FieldTypeDescription
provision_indexintegerIndex into the provisions array in extraction.json (0-based)
text_as_writtenstringThe dollar string being checked (e.g., "$2,285,513,000")
found_in_sourcebooleanWhether the string was found anywhere in the source text
source_positionsarray of integersCharacter offset(s) where the string was found. Empty if not found.
statusstringVerification result (see below)

Status Values

StatusMeaningAction
verifiedDollar string found at exactly one position in the source text. Highest confidence — amount is real and location is unambiguous.None needed
ambiguousDollar string found at multiple positions. Amount is correct but location is uncertain (common for round numbers like $5,000,000).Acceptable — not an error
not_foundDollar string not found anywhere in the source text. The LLM may have hallucinated or misformatted the amount.Review manually — check the source XML
mismatchInternal consistency check failed — the parsed dollars integer doesn’t match the text_as_written string.Review manually — likely a parsing issue

Example

{
  "provision_index": 0,
  "text_as_written": "$2,285,513,000",
  "found_in_source": true,
  "source_positions": [431],
  "status": "verified"
}

Counts in Example Data

BillVerifiedAmbiguousNot Found
H.R. 43667627230
H.R. 58603320
H.R. 9468200
Total7977250

Raw Text Checks (raw_text_checks)

One entry per provision. Checks whether the provision’s raw_text excerpt is a substring of the source bill text, using tiered matching.

FieldTypeDescription
provision_indexintegerIndex into the provisions array (0-based)
raw_text_previewstringFirst ~80 characters of the raw text being checked
is_verbatim_substringbooleanTrue only for exact tier matches
match_tierstringHow closely the raw text matched (see below)
found_at_positioninteger or nullCharacter offset if exact match; null otherwise

Match Tiers

TierMethodWhat It HandlesCount in Example Data
exactByte-identical substring matchClean, faithful extractions2,392 (95.6%)
normalizedMatches after collapsing whitespace and normalizing curly quotes ("") and dashes (-)Unicode formatting differences from XML-to-text conversion71 (2.8%)
spacelessMatches after removing all spacesWord-joining artifacts from XML tag stripping0 (0.0%)
no_matchNot found at any tierParaphrased, truncated, or concatenated text from adjacent sections38 (1.5%)

Example

{
  "provision_index": 0,
  "raw_text_preview": "For an additional amount for ''Compensation and Pensions'', $2,285,513,000, to r",
  "is_verbatim_substring": true,
  "match_tier": "exact",
  "found_at_position": 371
}

Arithmetic Checks (arithmetic_checks)

Group-level sum verification — checks whether line items within a section or title sum to a stated total.

Note: This field is deprecated in newer extraction files. It may be absent or empty. When present, it uses this structure:

FieldTypeDescription
scopestringWhat’s being summed (e.g., a title or division)
extracted_sumintegerSum of extracted provisions in this scope
stated_totalinteger or nullTotal stated in the bill, if any
statusstringverified, not_found, mismatch, or no_reference

Old files that include this field still load correctly. New extractions and upgrades omit it.


Completeness (completeness)

Checks whether every dollar-sign pattern in the source bill text is accounted for by at least one extracted provision.

FieldTypeDescription
total_dollar_amounts_in_textintegerHow many dollar patterns the text index found in the source bill text
accounted_forintegerHow many of those patterns were matched to an extracted provision’s text_as_written
unaccountedarray of UnaccountedAmountDollar amounts in the bill that no provision captured

UnaccountedAmount

Each entry represents a dollar string found in the source text that wasn’t matched to any extracted provision:

FieldTypeDescription
textstringThe dollar string (e.g., "$500,000")
valueintegerParsed dollar value
positionintegerCharacter offset in the source text
contextstringSurrounding text (~100 characters) for identification

Example

{
  "total_dollar_amounts_in_text": 2,
  "accounted_for": 2,
  "unaccounted": []
}

For a bill with unaccounted amounts:

{
  "total_dollar_amounts_in_text": 1734,
  "accounted_for": 1634,
  "unaccounted": [
    {
      "text": "$500,000",
      "value": 500000,
      "position": 45023,
      "context": "pursuant to section 502(b) of the Agricultural Credit Act, $500,000 for each State"
    }
  ]
}

The unaccounted amounts are typically statutory cross-references, loan guarantee ceilings, struck amounts in amendments, or prior-year references in CRs. See What Coverage Means (and Doesn’t) for detailed interpretation.

Coverage Calculation

Coverage = (accounted_for / total_dollar_amounts_in_text) × 100%
BillTotalAccountedCoverage
H.R. 4366~1,734~1,63494.2%
H.R. 5860~36~2261.1%
H.R. 946822100.0%

Verification Summary (summary)

Roll-up metrics for the entire bill — these are the numbers displayed by the audit command.

FieldTypeDescription
total_provisionsintegerTotal provisions checked
amounts_verifiedintegerProvisions whose dollar amount was found at exactly one position
amounts_not_foundintegerProvisions whose dollar amount was NOT found in source text
amounts_ambiguousintegerProvisions whose dollar amount appeared at multiple positions
raw_text_exactintegerProvisions with exact (byte-identical) raw text match
raw_text_normalizedintegerProvisions with normalized match
raw_text_spacelessintegerProvisions with spaceless match
raw_text_no_matchintegerProvisions with no raw text match at any tier
completeness_pctfloatPercentage of source dollar amounts accounted for (100.0 = all captured)
provisions_by_detail_levelobjectCount of provisions at each detail level (e.g., {"top_level": 483, "sub_allocation": 396})

Example (H.R. 9468)

{
  "total_provisions": 7,
  "amounts_verified": 2,
  "amounts_not_found": 0,
  "amounts_ambiguous": 0,
  "raw_text_exact": 5,
  "raw_text_normalized": 0,
  "raw_text_spaceless": 0,
  "raw_text_no_match": 2,
  "completeness_pct": 100.0,
  "provisions_by_detail_level": {
    "top_level": 2
  }
}

Mapping to Audit Table Columns

Audit ColumnSummary Field
Provisionstotal_provisions
Verifiedamounts_verified
NotFoundamounts_not_found
Ambigamounts_ambiguous
Exactraw_text_exact
NormTextraw_text_normalized
Spacelessraw_text_spaceless
TextMissraw_text_no_match
Coveragecompleteness_pct

How verification.json Is Used

By the audit command

The audit command reads verification.json for each bill and renders the summary metrics as the audit table.

By the search command

Search uses verification data to populate these output fields:

Search Output FieldSource in verification.json
amount_statusamount_checks[i].status — mapped to "found", "found_multiple", or "not_found"
match_tierraw_text_checks[i].match_tier"exact", "normalized", "spaceless", or "no_match"
qualityDerived from both: "strong" if amount verified + text exact; "moderate" if either is imperfect; "weak" if amount not found; "n/a" for provisions without dollar amounts

By the summary command

The summary footer (“0 dollar amounts unverified across all bills”) counts the total amounts_not_found across all loaded bills.


When verification.json Is Generated

  • By extract: Automatically after LLM extraction completes. Verification runs against the source XML with no LLM involvement.
  • By upgrade: Re-generated when upgrading extraction data to a new schema version. The source XML must be present in the bill directory for verification to run.

If the source XML (BILLS-*.xml) is not present, verification is skipped and verification.json is not created or updated.


Accessing verification.json

From the CLI

You don’t need to read this file directly — the audit and search commands surface its data in user-friendly formats.

From Python

import json

with open("data/118-hr9468/verification.json") as f:
    v = json.load(f)

# Summary metrics
print(f"Not found: {v['summary']['amounts_not_found']}")
print(f"Coverage: {v['summary']['completeness_pct']:.1f}%")
print(f"Exact text matches: {v['summary']['raw_text_exact']}")

# Check individual provisions
for check in v["amount_checks"]:
    if check["status"] == "not_found":
        print(f"WARNING: Provision {check['provision_index']}: {check['text_as_written']} not found in source")

# See unaccounted dollar amounts
for ua in v["completeness"]["unaccounted"]:
    print(f"Unaccounted: {ua['text']} at position {ua['position']}")
    print(f"  Context: {ua['context']}")