
Check whether synthetic data is code-compatible with the original
Source:R/code-readiness.R
check_code_readiness.RdEvaluates whether code written against the synthetic development twin will
run against the original data without errors. Checks column presence, R
class compatibility, factor level coverage, all-NA columns, zero-variance
columns, missingness spikes, and ID uniqueness. haven_labelled columns
currently round-trip as character in synthetic data, so that class change is
expected for now.
Arguments
- original
The original data frame.
- synthetic
The synthetic data frame (from
synthesize_data()).- roles
Optional; a
dataganger_rolesobject fromdetect_roles(). Used for ID-uniqueness checks. Computed internally ifNULL.
Value
An S3 object of class dataganger_code_readiness with
components:
- checks
A tibble with one row per check:
check,scope,column,status("pass"/"warn"/"fail"),message.- summary
List with
n_pass,n_warn,n_fail,ready(TRUE when n_fail == 0).- meta
List with dimensions and
generated_at.
Examples
orig <- data.frame(x = 1:10, y = factor(letters[1:10]))
spec <- synth_spec(purpose = "demo")
syn <- synthesize_data(orig, spec)
check_code_readiness(orig, syn)
#>
#> ── DataGangeR Code Readiness ───────────────────────────────────────────────────
#> 9 pass, 1 warn, 1 fail
#> ✖ Not ready: 1 blocking issue
#>
#>
#> ── Failures ──
#>
#> ✖ [x] class mismatch: original 'integer' vs synthetic 'numeric'
#>
#> ── Warnings ──
#>
#> ! [y] Synthetic column has <= 1 unique value; model formulas using this column may fail