Skip to contents

Data Input and Profiling

Read data from files and compute a statistical profile. These functions are the first step in the DataGangeR workflow.

read_input()
Read a data file into a tibble
profile_data()
Profile a dataset column-by-column
looks_aggregated()
Heuristic: does this data frame look pre-aggregated (a table of counts)?

Column Role Detection

Detect each column’s two intrinsic disclosure axes — whether it identifies a person (none / combination / direct) and whether it is sensitive — plus synthesis roles. The app treats these two axes as the source of truth; the CLI still accepts disclosure_roles: as a compatibility mapping.

detect_roles()
Detect data roles for each column
suggest_min_rows()
Suggest a sufficient synthetic row count

Synthesis Specification

Define how the synthetic data should be generated — purpose, fidelity, row count, engine, seed, and disclosure settings.

synth_spec()
Create a synthesis specification

Synthesis

Synthesize a dataset from a real data frame.

synthesize_data()
Synthesize a data double

Comparison

Compare a synthetic dataset against its original using distribution and relationship-interaction tests to assess fidelity.

compare_synthetic()
Compare original and synthetic datasets
plot_comparison()
Plot comparison summaries

Disclosure and Privacy

Assess and enforce k-anonymity and other disclosure-risk properties. Synthetic outputs reduce direct disclosure risk but do not provide a formal privacy guarantee.

privacy_check()
Run disclosure-risk privacy checks
assess_kanonymity()
Assess k-anonymity over a set of quasi-identifier columns
enforce_kanon()
Enforce k-anonymity on a synthetic dataset (output guarantee)

Export and Bundles

Export synthetic data and create agent bundles for sharing with collaborators or AI programming tools.

export_synthetic()
Export a synthetic data bundle
export_diagnostic_package()
Export a Lens-compatible diagnostic schema for a dataset
make_agent_bundle()
Create a one-command agent-ready bundle from a raw data file
check_code_readiness()
Check whether synthetic data is code-compatible with the original

Feedback

Report a problem or suggest a feature via a pre-filled GitHub issue.

report_issue()
Report a problem or share feedback

CLI

Command-line interface entry point.

dataganger_cli()
DataGangeR command-line interface

Shiny App

Launch the interactive DataGangeR Shiny application.

run_app()
Launch the DataGangeR Shiny Application

Example Datasets

Small synthetic datasets included in the package for use in examples, tests, and learning the workflow without real data.

example_health_survey
Example health survey dataset
example_admin_claims
Example administrative claims dataset
example_registry
Example disease registry dataset
individual_sample
Individual-level synthetic sample data
temporal_sample
Temporal synthetic sample data
geographic_sample
Geographic synthetic sample data