How to Use AI With Confidential Data Without Exposing It
Entity substitution: send fake data to AI, get real analysis back. A practical how-to for using LLMs with confidential business data.
Most enterprise AI policies boil down to a binary: block AI entirely for sensitive work, or accept the risk of data exposure. The first option creates a productivity tax on your highest-value tasks. The second is a lawsuit waiting to happen.
There’s a third option that I’ve been using with clients for a while now, and it’s surprisingly low-tech. Send fake data to AI. Receive processed output. Map results back to real data. The AI never sees your actual information.
I call it entity substitution protocol. The concept borrows from homomorphic encryption — computation on protected data — but requires zero cryptographic expertise. Spreadsheets and find-replace. That’s the implementation.
How Entity Substitution Works
The workflow has three steps:
1. Preprocess — Replace sensitive entities with unrelated placeholders. Your real contract pricing for liquid cooling systems at 45% discount becomes “Halloween costumes at 12% discount.” Your real vendor Acme Corp becomes “Client Alpha.”
2. Process — Send the placeholder data to AI for analysis. The AI analyzes “Halloween product pricing.” It doesn’t know or care that it’s actually looking at enterprise technology contracts.
3. Postprocess — Map the AI’s output back to your real entities. The analytical insights, recommendations, and patterns all apply — because you preserved the structural relationships.
You care about the analytical output, not whether the AI “knows” your real data. The mathematical relationships and optimization opportunities remain valid regardless of entity names.Building Your Mapping Table
The mapping table is the critical component. It stays local — never in cloud storage, never in email, never in a shared doc.
| Entity Type | Real Value | Placeholder | Design Rule |
|---|---|---|---|
| Product | Liquid cooling systems | Halloween costumes | Maintain relative complexity |
| Discount | 45% | 12% | Preserve ratio relationships |
| Customer | Acme Corp | Client Alpha | Anonymize identifiers |
| SKU | LCS-2024-X | HC-001 | Keep length consistent |
Four Design Rules That Matter
- Preserve structural relationships. If two real entities have similar characteristics, their placeholders should too.
- Maintain scale. If real discounts range 10-50%, placeholders should span similar ranges. Don’t compress everything into a narrow band.
- Consistent cardinality. 12 product categories need 12 placeholder categories. Don’t reduce or expand the set.
- Domain separation. Choose placeholder domains completely unrelated to your industry. Enterprise software company? Use food service terms.
👉 Tip: Make preprocessing deterministic and repeatable. Build a script or at minimum an Excel find-replace macro. Manual find-replace introduces error risk, and one slip puts real data in an AI prompt.
Where This Works Best
Financial Analysis
Scale revenue by a constant factor. Replace vendor names with generic suppliers. Map cost centers to unrelated departments. AI analyzes margin trends on the transformed data. You map recommendations back to real vendors and real numbers.
Example: “Q3 revenue from Acme distribution contract: $2.4M, COGS: $1.8M” becomes “Q3 revenue from Contract-Alpha: $3,288, COGS: $2,466.” The margin analysis is identical. The actual numbers never leave your infrastructure.
Contract Review
Replace party names with “Party A” and “Party B.” Map specific product terms to generic equivalents. Scale pricing figures. AI reviews terms, suggests improvements, identifies risks — all on data that’s meaningless if leaked.
HR and Compensation Analysis
“Sarah Johnson, Senior Data Engineer, $145,000, Exceeds Expectations” becomes “Employee-042, IC-Level-3, 128% of baseline, Rating-4.” AI analyzes compensation equity, identifies outliers, suggests adjustments. PII never reaches a third-party service. This helps with GDPR and CCPA compliance by design.
Customer and Vendor Intelligence
Assign persistent pseudonyms — “Customer-A” maps to the same real customer across every analysis. Maintain consistency so patterns compound across multiple sessions.
Benefits of entity substitution over full AI lockdown:
- Finance teams can use AI for P&L analysis without exposing real numbers
- Legal teams can leverage AI for contract review without exposing terms
- HR can automate compensation analysis without exposing PII
- Vendor data stays confidential while still getting AI-powered insights
- You comply with data residency and privacy requirements by keeping sensitive data local
Where This Doesn’t Work
I want to be honest about the limitations. This isn’t bulletproof security. It’s pragmatic risk reduction.
Don’t use it for:
- Creative work requiring industry context (AI needs to know your domain)
- Strategy development needing market knowledge
- Very small datasets where patterns are too obvious
- Real-time operations like customer service chat
- Highly interconnected relational data where structure reveals identity
Watch out for:
- Structural inference attacks — If placeholder data maintains real-world patterns, sophisticated analysis could reverse-engineer the mapping
- Volume-based correlation — 10,000 queries about “Customer-A” reveals you have one very large customer
- Mapping table compromise — If exposed, all historical AI interactions are retroactively compromised
- Human operational error — Forgetting to preprocess, mixing real and fake data, accidentally sharing the table
Implementation in Four Weeks
Week 1: Scope and Map
Identify your first use case. List all sensitive entity types. Design placeholder schema. Create the mapping table in a spreadsheet. Define storage and access controls.
Week 2: Build the Pipeline
Create preprocessing script or template. Build postprocessing reverse-mapping. Test full round-trip with non-sensitive sample data. Identify human error points.
Week 3: Harden Security
Establish encrypted backup for mapping table. Set access controls — who can view and edit. Document recovery procedures if the table is lost. Train the team on security requirements.
Week 4: Deploy and Iterate
Run pilot with single use case and small team. Monitor for process compliance. Collect feedback on friction points. Refine automation. Expand if successful.
👉 Tip: Most failures will be operational, not technical. Someone forgets to preprocess. Someone mixes real and fake data in the same prompt. Design your workflow for human error, not just technical correctness.
The Cost of Getting This Wrong vs. Right
Mapping table overhead:
- Initial creation: 2-4 hours
- Maintenance: ~30 minutes/month
- Per-use (manual): 2-5 minutes
- Per-use (automated): under 30 seconds
That’s negligible compared to the cost of a data breach — or the opportunity cost of blocking AI entirely for your most valuable analytical work.
Enterprise AI isn’t about perfect security. It’s about acceptable risk for meaningful value. Entity substitution won’t satisfy a nation-state adversary. It will satisfy most compliance frameworks, most audit requirements, and most common-sense risk assessments for mid-market companies.
Pick a single low-risk use case. Build the mapping table. Test the full workflow. Measure AI output quality. If it works, expand scope.
Continue reading:
- How to Implement AI Without Wasting Six Figures on the Wrong Vendor — the broader implementation framework
- The 5 Discovery Questions for AI — figure out where AI fits before worrying about data security
- AI Readiness Assessment Guide — evaluate whether your organization is ready
- Smart Cost Management for Business Success — the cost lens for evaluating AI investments
