# Methodology Supplement — Youth Need-vs-Access (state geography)
**Companion to**: `mh_gap_youth_v1_article.md`
**Framework tool**: `atlas.need_vs_access_framework_v1` v1.1.0 (DaedArch tool registry)
**Version**: v1.0 draft · May 2026
**Trellison Institute · methodology-rated**
This supplement documents the framework adaptation from the tract-level adult analysis to the state-level youth analysis. The pipeline code is unchanged; one input parameter (`regression_grouping`) toggles between within-state OLS (default for tract/county studies) and a single national OLS (default for state studies).
---
## 1. Inputs
| Parameter | Value | Source |
|---|---|---|
| `need.collection` | `connector_data.yrbss_state_need_v1` | CDC YRBSS 2023 (Mental Health Indicators, data.cdc.gov/resource/nu3s-3dwd) |
| `need.measure_field` | `sad_hopeless_2wk_pct` | YRBSS Q: "During the past 12 months, did you ever feel so sad or hopeless almost every day for two weeks or more in a row that you stopped doing some usual activities?" |
| `need.geography_id_field` | `state_abbr` | State 2-letter abbreviation |
| `access.collection` | `connector_data.cms_nppes_youth_serving_v1` | CMS NPPES (May 2026), filtered to 8 youth-serving mental-health taxonomies (primary taxonomy = youth-serving) |
| `access.rollup_to` | `state` | Provider count per state |
| `access.geography_id_field` | `state_abbr` | State 2-letter abbreviation |
| `population.collection` | `connector_data.acs_under18_state_v1` | ACS 1-year 2023, variable B09001_001E |
| `population.pop_field` | `under_18_population` | Total under-18 population by state |
| `covariate.collection` | `connector_data.acs_uninsured_under19_state_v1` | ACS 1-year 2023, variable S2701_C05_002E |
| `covariate.measure_field` | `uninsured_under19_pct` | Percent uninsured under-19 by state |
| `geography_level` | `state` | New supported geography in v1.1 |
| `population_threshold` | 50,000 | Excludes only the smallest non-state territories |
| `outlier_threshold_sigma` | 1.5 | Same as the adult study default |
| `dartboard_n_per_class` | 4 | 12 hits max; actual depends on each class population |
| `regression_grouping` | `national` | New v1.1 parameter — single OLS over all geographies |
## 2. Why state geography
YRBSS publishes prevalence at state and national resolution only. There is no tract-level, county-level, or ZCTA-level YRBSS-equivalent release. The closest under-18 surveillance with sub-state resolution is the National Survey of Children's Health (NSCH), which is parent-reported, ages 0-17, and is published state-level only — sub-state granularity is not authorized.
A tract-level companion to the adult MH Gap V1 study is not currently buildable from public federal sources for under-18. Custom-instrument microdata access (BRFSS Child Module, where states elect to participate) is available but at the cost of accepting limited state coverage and the loss of population-level generalizability.
The state-level published dataset reflects what the public federal surveillance infrastructure can support today.
## 3. The framework adaptation
The framework's residual analysis was originally written for tract-level analysis where each state has hundreds of tracts. Within-state OLS works because each state provides its own fit. At state geography:
- There are 50 states (50 observations) total — for U.S. proper. With participating states the 2023 YRBSS pool is 39 (some non-participate; some non-releasable).
- After all four joins succeed and `population_threshold=50,000` is applied, **35 states** remain.
- Within-state grouping has 1 observation per "state" — undefined fit.
The single-line adaptation: add a `regression_grouping` parameter to the framework. Default `state` for tract/county studies; `national` for state studies. With `regression_grouping="national"`, the framework fits one OLS regression over all 35 states and computes z-scores in a single residual distribution.
Statistical adequacy: 35 observations with a 2-parameter (intercept + slope) model. Residual standard deviation is well-defined and the z-score classification at ±1.5σ admits a sensible class distribution (3 positive_outlier · 2 negative_outlier · 30 expected) without producing meaningless edge cases.
## 4. Validation against the tract-level methodology
The framework's outputs are functionally identical between geography levels:
- Same `gap_ratio` and `log_gap_ratio` formulas.
- Same z-score classification.
- Same dartboard sampling (population-weighted, stratified by residual class).
- Same persistence schema: `analysis_outputs.<study_id>_<geography>_v1`.
The only output difference is per-record granularity. A tract-level state has ~1,500 records; a state-level state has 1.
## 5. Limitations specific to state-level analysis
1. **Within-state variation is invisible**. The dramatic intra-state heterogeneity that the adult tract-level analysis surfaced (Brewster TX vs Houston) is not detectable in the youth state-level data.
2. **No drive-time analysis is meaningful**. State centroids are not a sensible point of reference; the adult paper's 2.6%-over-60-minute headline has no youth equivalent.
3. **35 observations** is sufficient for a 2-parameter regression but limits the statistical power of residual classification. Negative-outlier class has only 2 states (VT, AK); positive-outlier 3 (PR, NC, NJ). Bootstrap sensitivity is reported in the sensitivity supplement.
4. **State-level instruments do not capture state-internal policy variation**. New Jersey shows up as a single residual class even though youth mental-health policy varies substantially between northern and southern New Jersey counties.
## 6. Sensitivity parameters tested
The framework exposes:
| Parameter | Default | Range tested |
|---|---|---|
| `outlier_threshold_sigma` | 1.5 | 1.0, 1.5, 2.0 |
| Narrow vs broad taxonomy set | Broad (79,868 providers) | Narrow (4,507) for sensitivity |
| `population_threshold` | 50,000 | 25,000, 50,000, 100,000 |
| `dartboard_n_per_class` | 4 | 3, 4, 6 |
Detailed sensitivity tables are in `mh_gap_youth_v1_sensitivity_analysis.md`. The headline finding (39.4% pop-weighted youth prevalence, 2.4× adults, 85× supply variation, PR/NC/NJ positive outliers + VT/AK negative outliers) is robust across all parameter variants.
## 7. Reproducibility
```python
POST /api/v4/execute
{
"tool_id": "atlas.need_vs_access_framework_v1",
"inputs": {
"study_id": "mh_gap_youth_v1",
"need": {"collection":"connector_data.yrbss_state_need_v1","measure_field":"sad_hopeless_2wk_pct","geography_id_field":"state_abbr"},
"access": {"collection":"connector_data.cms_nppes_youth_serving_v1","rollup_to":"state","geography_id_field":"state_abbr"},
"population": {"collection":"connector_data.acs_under18_state_v1","geography_id_field":"state_abbr","pop_field":"under_18_population"},
"covariate": {"collection":"connector_data.acs_uninsured_under19_state_v1","measure_field":"uninsured_under19_pct","geography_id_field":"state_abbr"},
"geography_level": "state",
"population_threshold": 50000,
"outlier_threshold_sigma": 1.5,
"dartboard_n_per_class": 4,
"regression_grouping": "national"
}
}
```
This call is byte-for-byte reproducible against the published dataset (modulo CSV float formatting).
## 8. Versioning of the framework
- v1.0.0 (May 2026): initial release; tract-level only with within-state OLS.
- **v1.1.0 (May 2026): adds `regression_grouping="national"` for state-level studies. Youth study is the first published application.**
- v1.2.0 (planned): multivariate residual regression (uninsured + income).
- v2.0.0 (planned): hierarchical Bayesian shrinkage for low-N residual groupings.
The framework is open-source by design; the methodology audit is the product. The youth analysis is its second showcase application after the tract-level adult MH Gap V1.