What this is + why it works — Technical dialect
v1.2 · macOS local · no upload open source

Strip the secrets.
Keep the structure.

CleanSchema detects PII and sensitive values in your .csv and .xlsx files and replaces them with realistic synthetic data. Same column names, same data types, same row count — zero real values. Runs entirely on your Mac.

cleanschema · employees_q1.csv
load "employees_q1.csv" // 847 rows · 12 cols
 
✗ sensitive first_name NAME → synthesize
✗ sensitive email EMAIL → synthesize
✗ sensitive salary FINANCIAL → synthesize
✗ sensitive ssn ID → synthesize
✓ safe department CATEGORICAL → keep as-is
✓ safe hire_year NUMERIC → keep as-is
 
✓ done "employees_q1_clean.csv" // 0 real values · safe to share
100% local · no upload, no server
13 sensitive types auto-detected
CSV & Excel supported
Open source · read every line
// proof

Same shape. Different values.

employees_q1.csv
847 rows · 12 columns
processed in 0.34s
employees_q1.csv // before sensitive
first_nameemailsalaryssndept
Jenniferj.martinez@acme.com$87,500412-38-9201Engineering
Marcusm.chen@acme.com$112,000203-77-4501Engineering
Priyap.shah@acme.com$95,800596-12-8430Design
Davidd.wright@acme.com$78,200719-04-3322Operations
Aishaa.patel@acme.com$103,400831-66-9011Engineering
5 of 847 rows shown // real PII · do not share
employees_q1_clean.csv // after safe to share
first_nameemailsalaryssndept
Denised.holloway@example.org$91,200738-52-1094Engineering
Oweno.bennett@example.org$108,300419-22-6783Engineering
Camilac.ortiz@example.org$98,600662-08-5142Design
Nathann.briggs@example.org$81,900277-91-3805Operations
Imanii.kelley@example.org$106,750514-49-2278Engineering
5 of 847 rows shown // 0 real values

Categorical fields like dept pass through untouched. Statistical distributions match the source. Joins still work.

// why it ships

Built for one job. Done in seconds.

01 — local

Stays on your Mac

No upload. No API call. No cloud processing. The file enters memory, leaves as a clean copy.

02 — offline

Air-gapped friendly

Works with Wi-Fi off. Suitable for restricted environments where data can't leave the network.

03 — preserved

Structure intact

Column names, data types, row counts, joins, distributions — all preserved. Only the values change.

04 — auditable

Open source

Every classification rule and replacement function is in the repo. Read it. Fork it. Trust it on inspection, not on faith.

// flow

Three steps. One clean file.

~30s from open to download
no configuration required
step 01

Drop your file

CSV or Excel. Any size, any structure. CleanSchema reads it locally — nothing transmitted, nothing logged.

$ drop employees_q1.csv → cleanschema
step 02

Review what was detected

13 sensitive-data classifiers run against column names and value patterns. You see every detection — and can override any of them — before anything is replaced.

$ 4 sensitive · 8 safe · review →
step 03

Download the clean copy

Sensitive values become realistic synthetics. Names look like names. Salaries fall in the same range. Joins still work because IDs replace consistently.

$ save employees_q1_clean.csv ✓
// the promise

Your data never leaves your Mac.

CleanSchema runs as a local Python app — no servers, no API keys, no telemetry. The clean file lands in the same folder. The original is untouched. There is nothing for us to see, because we never built the pipe to see it.

  • no internet connection required
  • no account, no login, no email
  • no analytics, no telemetry, no logs
  • open source — every line auditable
  • works in air-gapped environments
network requests0
data uploaded0 bytes
accounts requirednone
telemetry events0
third-party SDKs0
your file location~/wherever/it/was
// detection engine

Thirteen sensitive types. Detected on column name and value shape.

red = stripped
teal = passed through
Financialsalary, revenue, amount
Namefirst_name, full_name, customer
Emailemail, mail, contact
Phonephone, mobile, cell, tel
Identity / IDssn, employee_id, account_no
Addressstreet, address, city, state
Datedob, hire_date, created_at
ZIP / Postalzip, postal_code, zip_code
Free textnotes, description, comments
Categoricaldepartment, status, region
Numericcount, units, quantity
Percentagerate, pct, completion
Booleanactive, is_admin, enabled
// download

Free. Local. Yours.

Open source. Runs on Mac. Requires Python 3.9+. Setup takes about 30 seconds.

Download on GitHub $ see the demo first
// quick install
$ git clone github.com/drjonesy/cleanschema
$ cd cleanschema
$ pip install -r requirements.txt
$ streamlit run app.py

or double-click run.command on macOS — opens in your browser