Compares an original dataset with its synthetic double across dataset-level
dimensions, numeric distributions, categorical distributions, and numeric
correlations. Returns a structured dataganger_comparison object.
Arguments
- original
The original data frame.
- synthetic
The synthetic data frame (from
synthesize_data()).- roles
Optional; a
dataganger_rolesobject fromdetect_roles().
Value
An S3 object of class dataganger_comparison, a list with
components dataset, numeric, categorical, relationship, interaction,
privacy_flags, and meta.
Examples
dat <- data.frame(x = 1:10, y = letters[1:10])
spec <- synth_spec(purpose = "demo")
syn <- synthesize_data(dat, spec)
compare_synthetic(dat, syn)
#> ℹ Not enough numeric columns (1) for correlation comparison.
#> Need at least 2 numeric columns with non-zero variance.
#>
#> ── DataGangeR Comparison ───────────────────────────────────────────────────────
#>
#> ── Dataset ──
#>
#> • Rows: 10 (original) -> 10 (synthetic)
#> • Columns: 2 (original) -> 2 (synthetic)
#> • Type match: 50%
#> • Missing: 0% (original) -> 0% (synthetic)
#>
#> ── Numeric -- top 3 by |standardized difference| ──
#>
#> • x: std diff = -0.727
#> Orig mean (SD): 5.5 (3.03)
#>
#> ── Categorical -- top 3 by distributional difference ──
#>
#> • y: p = 0.0293, TVD = 1
#> Levels: 10 (orig) -> 1 (syn)
