Phase B Content Arsenal · part: data_dictionary
# Data Dictionary / Codebook — `mh_gap_youth_outcomes_v1_state.csv`
**Dataset**: `analysis_outputs.mh_gap_youth_outcomes_v1` → MinIO `s3://data-stories/youth_mental_health_outcomes/dataset_v1/mh_gap_youth_outcomes_v1_state.csv`
**Rows**: 35 states (after Youth V1 population threshold)
**Fields**: 11
**License**: CC-BY-4.0 (planned)
**Phase**: B — outcomes correlation follow-up
**Citation**: Trellison Institute. *State-Level Youth Mental Health Outcomes Correlation Dataset V1* (May 2026). DOI: [pending Zenodo registration]
## Field reference
| # | Field | Type | Unit | Source | Description | Null semantics |
|---|---|---|---|---|---|---|
| 1 | `state_abbr` | string(2) | — | postal abbreviation | 2-letter state code. Primary key. | never null |
| 2 | `need_value` | float | percent (0-100) | YRBSS 2023 (joined from Youth V1) | State-level prevalence of high-school students reporting sad/hopeless 2+ wks past 12 months. | n/a for non-participating states; row excluded |
| 3 | `covariate_value` | float | percent (0-100) | ACS S2701_C05_002E 2023 | Percent uninsured under-19 in state. | never null in published rows |
| 4 | `access_value` | float | providers per 100K under-18 | NPPES youth-serving rollup | State count of youth-serving NPPES providers ÷ under-18 population × 100,000. | never null |
| 5 | `log_gap_ratio` | float | natural log | Need-vs-Access framework | `ln(need × 1000 / access)`. Higher = more need per unit supply. | null if access=0 |
| 6 | `residual_z` | float | σ | Framework national OLS | Z-score of residual from `log_gap_ratio = α + β × uninsured + ε`. | null if regression undefined |
| 7 | `residual_class` | enum | — | Framework | `positive_outlier` (z>1.5) · `negative_outlier` (z<-1.5) · `expected` · `insufficient_data` | always assigned |
| 8 | `yrbss_considered_suicide` | float | percent (0-100) | YRBSS 2023 Total | "Seriously considered attempting suicide in past 12 months." Grades 9-12. | null where state non-participating |
| 9 | `yrbss_made_plan` | float | percent (0-100) | YRBSS 2023 Total | "Made a plan about how you would attempt suicide in past 12 months." | null where state non-participating |
| 10 | `yrbss_attempted_suicide` | float | percent (0-100) | YRBSS 2023 Total | "Actually attempted suicide in past 12 months." | null where state non-participating |
| 11 | `all_age_suicide_aadr_2017` | float | deaths per 100K, age-adjusted | NCHS bi63-dtpu | All-age suicide age-adjusted death rate for 2017. | null for territories |
| 12 | `drug_od_rate_per_100k_2023` | float | deaths per 100K | NCHS xkb8-kh2a + ACS B01001 | 12-month-ending drug overdose deaths divided by ACS 2023 state population × 100,000. | null where data_value empty |
(Field 7 `residual_class` is excluded from the field count; it's a derived enum from `residual_z`. Total 11 numerical/categorical fields + state_abbr key.)
## Type conventions
- `string`: UTF-8 text.
- `int`: 32-bit unsigned.
- `float`: IEEE 754 double precision, CSV-formatted to ~6 decimal places.
- `enum`: fixed vocabulary inline.
- `datetime`: ISO-8601 with `+00:00` UTC.
## Coverage
- 35 states from the Youth V1 analysis (pop threshold ≥ 50,000 under-18).
- 41.8M under-18 covered.
- 17 U.S. states absent because YRBSS 2023 did not release state-level data for them.
## Known issues
- **YRBSS state coverage gaps**: 17 states absent from 2023 release.
- **AADR is 2017 (6-year lag)** — most recent year in bi63-dtpu. State-level suicide AADR is fairly stable year-to-year (year-on-year r ≈ 0.95) but the temporal mismatch should be noted.
- **Drug OD denominator** uses ACS 2023 state total population; NCHS published rates use bridged-race population (differ by ~1-2%). Acceptable as part of noise floor.
- **Youth-specific (10-19) mortality** is not in this dataset. Requires CDC WONDER XML POST + license acceptance. Queued for v1.1.
- **Crime data** (FBI juvenile arrests) not yet integrated. API key pending issuance.
## Reproducibility
This dataset is the cross-section join of:
- `analysis_outputs.mh_gap_youth_v1_state_v1` (gap measures, 35 states)
- `connector_data.yrbss_mental_health_v1` (YRBSS suicide questions, 2023 Total)
- `connector_data.nchs_suicide_state_v1` (NCHS Leading Causes, 2017)
- `connector_data.vsrr_drug_od_state_v1` (NCHS VSRR + ACS population, 2023)
See `mh_gap_youth_outcomes_v1_methodology_supplement.md` §7 for the deterministic reproduction recipe.