
Package index
Data Input and Profiling
Read data from files and compute a statistical profile. These functions are the first step in the DataGangeR workflow.
-
read_input() - Read a data file into a tibble
-
profile_data() - Profile a dataset column-by-column
-
looks_aggregated() - Heuristic: does this data frame look pre-aggregated (a table of counts)?
Column Role Detection
Detect each column’s two intrinsic disclosure axes — whether it identifies a person (none / combination / direct) and whether it is sensitive — plus synthesis roles. The app treats these two axes as the source of truth; the CLI still accepts disclosure_roles: as a compatibility mapping.
-
detect_roles() - Detect data roles for each column
-
suggest_min_rows() - Suggest a sufficient synthetic row count
Synthesis Specification
Define how the synthetic data should be generated — purpose, fidelity, row count, engine, seed, and disclosure settings.
-
synth_spec() - Create a synthesis specification
-
synthesize_data() - Synthesize a data double
Comparison
Compare a synthetic dataset against its original using distribution and relationship-interaction tests to assess fidelity.
-
compare_synthetic() - Compare original and synthetic datasets
-
plot_comparison() - Plot comparison summaries
Disclosure and Privacy
Assess and enforce k-anonymity and other disclosure-risk properties. Synthetic outputs reduce direct disclosure risk but do not provide a formal privacy guarantee.
-
privacy_check() - Run disclosure-risk privacy checks
-
assess_kanonymity() - Assess k-anonymity over a set of quasi-identifier columns
-
enforce_kanon() - Enforce k-anonymity on a synthetic dataset (output guarantee)
Export and Bundles
Export synthetic data and create agent bundles for sharing with collaborators or AI programming tools.
-
export_synthetic() - Export a synthetic data bundle
-
export_diagnostic_package() - Export a Lens-compatible diagnostic schema for a dataset
-
make_agent_bundle() - Create a one-command agent-ready bundle from a raw data file
-
check_code_readiness() - Check whether synthetic data is code-compatible with the original
-
report_issue() - Report a problem or share feedback
-
dataganger_cli() - DataGangeR command-line interface
-
run_app() - Launch the DataGangeR Shiny Application
Example Datasets
Small synthetic datasets included in the package for use in examples, tests, and learning the workflow without real data.
-
example_health_survey - Example health survey dataset
-
example_admin_claims - Example administrative claims dataset
-
example_registry - Example disease registry dataset
-
individual_sample - Individual-level synthetic sample data
-
temporal_sample - Temporal synthetic sample data
-
geographic_sample - Geographic synthetic sample data