Skip to contents

Scans original and (optionally) synthetic data for disclosure-risk flags. Supports two stages: "pre" (before synthesis, requires only the original dataset and roles) and "post" (after synthesis, requires both original and synthetic).

Usage

privacy_check(
  original,
  synthetic = NULL,
  roles = NULL,
  stage = c("pre", "post"),
  spec = NULL
)

Arguments

original

The original data frame.

synthetic

Optional; the synthetic data frame (required for stage = "post").

roles

Optional; a dataganger_roles object from detect_roles(). Recommended for pre-stage flag detection. When omitted, fallback name/type heuristics are used.

stage

Character. "pre" or "post".

spec

Optional; a dataganger_spec object. When provided at stage = "post", cross-checks that synthesis parameters were applied (e.g. date coarsening, ID removal).

Value

An S3 object of class dataganger_privacy_check, a tibble with columns variable, flag, severity, stage, and recommendation.

Examples

df <- data.frame(id = 1:50, x = rnorm(50), city = rep("Toronto", 50))
roles <- detect_roles(df)
privacy_check(df, roles = roles, stage = "pre")
#> 
#> ── DataGangeR Privacy Check (pre stage) ────────────────────────────────────────
#> 
#> ── x HIGH severity (1) ──
#> 
#>id: ID column detected
#> Recommendation: Review whether this column should be excluded from synthetic
#> output
#> 
#> ── i LOW severity (1) ──
#> 
#>city: Geography column detected
#> Recommendation: Geography columns can be re-identifying; consider coarsening or
#> aggregation