Definitions
- signal
- A discrete, independently-queryable data dimension. The catalog unit. 'Texas STAAR' is one signal; 'STAAR + TAPR' is two.
- row
- A record in the underlying collection. Implementation detail. BANNED from headline framing.
- coverage_universe
- Plain-language description of the LARGEST defensible external scope of the primary signal (e.g. '~160M US parcels').
- coverage_universe_count_estimate
- Integer estimate of the universe size. Required when countable.
- coverage_universe_source
- Citation: Census, BLS, FBI UCR, World Bank API, etc. Required.
- coverage_is_estimate
- Boolean. True when the universe is an external estimate rather than hard enumeration. Default true.
- coverage_pct
- Integer 0-100. Fraction of the universe deployed and live. Honest — picking a smaller universe to inflate the % is a violation.
- signal_direction
- One of: read, write, both. Whether this domain supports read, write, or both — proxies count.
- write_capability
- Object with: available (bool), direct (bool: API/portal we operate), via_proxy (bool: third-party intermediary), proxy_examples (list of named proxies, e.g. registered agents, expeditors, customs brokers), friction (low|medium|high), status (live|pilot|none). Acknowledging proxy friction is required; 'we cannot' is rarely true.
- tool_status_live
- Tools with status in {active, live}.
- tool_status_staging
- Tools with status in {staging, maturation}.
Maturity Tiers
| Tier | Range | Meaning |
|---|---|---|
| Pilot | 0–25% | Capability exists for limited scope; expansion gated on customer pull. |
| Proven, pending demand | 25–50% | Beyond pilot, partial; ready to scale on request. |
| Partial — actively expanding | 50–75% | Likely ready for most engagements. |
| Production | 75–95% | Comprehensive coverage; rare gaps. |
| Complete | 95–100% | Full universe deployed. |
Signal Depth
| Level | Meaning |
|---|---|
| Shallow | Index/headline only. |
| Standard | Core fields + recent timestamps. |
| Deep | Full historical panel + gold rollups. |
| Enriched | Cross-source joins, methodological re-classification, annotations. |
Universe Selection Rules
- Anchor on the LARGEST defensible external scope of the primary signal (e.g. parcels → ~160M, not '50 states'). Tier comes out lower; that's honest.
- Always cite the source for the universe number.
- If sub-signals have wildly different coverage, anchor on primary, note secondaries in `note`. Don't average to inflate.
- Estimates fine when no exact count exists — mark `coverage_is_estimate: true`.
- Refresh estimates when authoritative sources publish new universe counts.
Direction Rules (read · write · both)
- Every catalog row declares one of: read, write, both.
- Most government/agency feeds are READ-only research data (Census ACS, FRED, NAEP).
- Domains that admit submission (permits, court filings, grants applications, customs entries) are BOTH — even if the write goes through a proxy.
- Internal pipelines we operate (LedgerWell oracle, attestations) are BOTH and direct.
- When `signal_direction = both` and `write_capability.direct = false`, name the proxy class (registered agents, expeditors, customs brokers, filing services).
- Friction reflects the realistic UX: low (API-mediated), medium (proxy-mediated, named partner), high (manual, human-in-loop, jurisdiction-specific paperwork).
Headline Framing Rules
- Lead chip is `maturity_tier`.
- Show `signal_count` second.
- Coverage chip shows universe count + percentage together (e.g. '26.3M of ~160M US parcels · 16%').
- Direction chip indicates read/write capability ('Read · Write via proxy').
- Tools chip shows live + staging counts together.
- NEVER use raw row counts in headline framing.
Required Catalog Fields
signal_countsignal_depthcoverage_universecoverage_universe_count_estimatecoverage_universe_sourcecoverage_is_estimatecoverage_pctmaturity_tiertools_livetools_stagingfreshness_cadencesignal_directionwrite_capability