Homomorphic AI: Protecting Sensitive Data in LLM Workflows

Most enterprise AI policies are binary: either block AI entirely for sensitive work, or accept the risk of data exposure. This creates a productivity tax on high-value tasks that would benefit most from AI assistance.

There’s a third option: entity substitution protocol. Borrowed from homomorphic encryption principles, adapted for practical business use. Send fake data to AI, receive processed output, map results back to real data. The AI never sees your actual information.

The Core Concept

Homomorphic encryption allows computation on encrypted data without decryption. You send scrambled inputs, receive scrambled outputs, decrypt locally. The server never sees plaintext.

We apply the same principle to LLM workflows using simple mapping tables:

Preprocess: Replace sensitive entities with unrelated placeholders
Process: Send placeholder data to AI for analysis/generation
Postprocess: Map AI outputs back to real entities

Example: Contract pricing negotiation.

Real data (confidential):

Liquid cooling systems: 45% discount
Edge computing hardware: 38% discount
Power distribution units: 42% discount

Mapped data (sent to AI):

Halloween costumes: 12% discount
Candy corn: 8% discount
Pumpkin decorations: 15% discount

The AI analyzes “Halloween product pricing.” You map results back to real products and discounts. Your actual pricing never leaves your infrastructure.

Implementation Architecture

Mapping Table Structure

The critical component is your local mapping table. This stays on-premise, never in cloud storage, never in email.

Minimum viable structure:

Entity Type	Real Value	Placeholder Value	Notes
Product Category	Liquid cooling systems	Halloween costumes	Maintain relative complexity
Discount	45%	12%	Preserve ratio relationships
Customer	Acme Corp	Client Alpha	Anonymize identifiers
SKU	LCS-2024-X	HC-001	Keep length consistent

Critical design rules:

Preserve structural relationships: If two real entities have similar characteristics, their placeholders should too. Don’t map complex products to simple placeholders.
Maintain scale: If real discounts range 10-50%, placeholders should span similar ranges. Ratio preservation matters for AI analysis quality.
Consistent cardinality: If you have 12 product categories, create 12 placeholder categories. The AI needs realistic data structure.
Domain separation: Choose placeholder domains unrelated to your industry. Enterprise software company? Use food service terms. Manufacturing? Use entertainment categories.

Preprocessing Pipeline

Before sending data to AI:

# Pseudocode - actual implementation varies by use case
input_data = load_contract_draft()
mapping_table = load_mapping_table_secure()

processed_data = input_data
for real_value, placeholder in mapping_table.items():
    processed_data = processed_data.replace(real_value, placeholder)

send_to_ai(processed_data)

Automation opportunities:

Excel/Google Sheets find-replace macros
Python scripts with pandas for bulk transformations
Custom CLI tools for repeated workflows
Pre-processing templates for common document types

The key: make preprocessing deterministic and repeatable. Manual find-replace introduces error risk.

Postprocessing Pipeline

When AI returns results:

# Pseudocode - reverse mapping
ai_output = receive_from_ai()
mapping_table = load_mapping_table_secure()

real_output = ai_output
for placeholder, real_value in mapping_table.items():
    real_output = real_output.replace(placeholder, real_value)

save_to_secure_location(real_output)

Same principle as preprocessing, reverse direction. Deterministic transformation ensures consistency.

Use Cases by Domain

Financial Analysis

Scenario: Using AI to analyze P&L trends, identify cost reduction opportunities, or model scenarios.

Sensitive data: Actual revenue figures, vendor names, cost centers.

Mapping approach:

Scale revenue by a constant factor (divide by 1000, multiply by 1.37, etc.)
Replace vendor names with generic suppliers (“Vendor-001”, “Vendor-002”)
Map cost centers to unrelated departments

Example transformation:

Real: “Q3 2024 revenue from Acme distribution contract: $2.4M, COGS: $1.8M”

Mapped: “Q3 2019 revenue from Contract-Alpha: $3,288, COGS: $2,466”

AI analyzes margin trends, identifies optimization opportunities. You map recommendations back to real vendors and actual numbers.

Quality consideration: Preserve ratios and relationships. If Vendor A is 3x larger than Vendor B in reality, maintain that ratio in placeholder data.

Contract Review and Drafting

Scenario: Using AI to review contract terms, suggest improvements, identify risks.

Sensitive data: Company names, specific terms, pricing structures, proprietary clauses.

Mapping approach:

Replace all party names with “Party A”, “Party B”, “Vendor-X”
Map specific product terms to generic equivalents
Translate pricing into scaled figures
Substitute proprietary language with industry-standard clauses

Example transformation:

Real: “Acme Corp agrees to purchase minimum 1,000 units of Model XZ-5000 at $4,250/unit with 45% discount for volumes exceeding 2,500 units annually”

Mapped: “Party A agrees to purchase minimum 100 units of Product-001 at $425/unit with 12% discount for volumes exceeding 250 units annually”

The AI reviews structure, flags unusual terms, suggests improvements. Map results back to real terms.

Risk mitigation: Even if AI output leaks, it contains no actual business terms. Your intellectual property remains protected.

Customer and Vendor Intelligence

Scenario: Analyzing customer behavior patterns, vendor performance, or relationship dynamics.

Sensitive data: Customer names, contact information, transaction history, relationship notes.

Mapping approach:

Assign each customer/vendor a persistent pseudonym (“Customer-A”, “Vendor-001”)
Maintain pseudonym consistency across all analyses
Map specific product/service names to categories
Scale transaction values

Example transformation:

Real: “TechCorp (customer since 2019) purchased 15 software licenses in Q1, 23 in Q2, 8 in Q3. Primary contact: John Smith (john@techcorp.com). Notes: Budget approval delays typical in Q4.”

Mapped: “Customer-A (relationship since 2019) purchased 15 units in Q1, 23 in Q2, 8 in Q3. Primary contact: Contact-001. Notes: Budget approval delays typical in Q4.”

AI identifies patterns (seasonal purchasing, decision cycle timing, growth trajectory). You apply insights to real customer relationships.

Critical rule: Maintain pseudonym consistency. If “Customer-A” appears in multiple analyses, it must map to the same real customer every time.

HR and Compensation Analysis

Scenario: Using AI to analyze compensation equity, suggest org structures, or draft job descriptions.

Sensitive data: Employee names, salaries, performance ratings, PII.

Mapping approach:

Replace names with “Employee-001”, “Employee-002”
Convert salaries to percentages of median or scale by constant
Map job titles to generic equivalents
Anonymize department names

Example transformation:

Real: “Sarah Johnson, Senior Data Engineer, Salary: $145,000, Performance: Exceeds Expectations, Team: Machine Learning Infrastructure”

Mapped: “Employee-042, IC-Level-3, Compensation: 128% of baseline, Performance: Rating-4, Team: Engineering-Group-C”

AI analyzes compensation equity across levels, suggests adjustments, identifies outliers. You apply recommendations to actual compensation planning.

Compliance note: This approach helps maintain GDPR/CCPA compliance by ensuring PII never reaches third-party AI services.

Security Properties and Limitations

What This Protects Against

AI vendor logging: Even if the AI service logs every prompt and response, your logs contain only placeholder data. No business intelligence exposed.

Model training contamination: If prompts are used to train future models, the model learns nothing about your actual business.

Data breach at AI provider: If the AI company’s infrastructure is compromised, attackers obtain meaningless placeholders.

Internal data exfiltration: Employees with AI access but not mapping table access cannot extract real business data through AI interactions.

Regulatory compliance gaps: For frameworks requiring on-premise data handling, this technique keeps sensitive data local while leveraging cloud AI capabilities.

What This Does NOT Protect Against

Structural inference attacks: If placeholder data maintains real-world patterns (necessary for AI quality), sophisticated analysis might reverse-engineer the mapping. Example: If “Halloween costume” consistently appears in contexts suggesting data center equipment, an industry expert might deduce the real category.

Volume-based correlation: Frequency patterns can leak information. If you send 10,000 queries about “Customer-A”, observers know you have at least one very large customer.

Mapping table compromise: If your mapping table is exposed (email, unsecured file share, laptop theft), all historical AI interactions are retroactively compromised. The mapping table is your most critical security asset.

AI quality degradation: The AI cannot apply domain-specific knowledge. It won’t suggest “liquid cooling systems typically command higher margins” when it thinks you’re selling Halloween costumes. You trade some AI capability for data protection.

Human operational errors: The biggest risk. Forgetting to preprocess data, mixing real and placeholder data, accidentally sharing the mapping table, or inconsistently applying mappings breaks the entire system.

Semantic leakage in relationships: Complex relationship patterns in data can reveal structure even when entities are masked. Be cautious with highly interconnected data.

Implementation Decision Framework

When to use this approach:

✅ Structured data analysis (tables, lists, categories) ✅ Document drafting with fill-in-the-blank sections
✅ Organizing and categorizing items
✅ Logic and consistency checking
✅ Mathematical or statistical operations
✅ Format conversion and data transformation
✅ Pattern detection in anonymizable data

When NOT to use this approach:

❌ Creative work requiring industry context
❌ Strategy development needing market knowledge
❌ Any task where semantic domain meaning is critical
❌ Very small datasets (patterns too obvious)
❌ Highly interconnected relational data
❌ Situations where a single error exposes everything
❌ Work requiring the AI to “understand” your specific business

Implementation Checklist

Phase 1: Scoping and Mapping (Week 1)

Identify specific use case for entity substitution
List all sensitive entity types requiring mapping
Design placeholder schema maintaining structural properties
Create initial mapping table (spreadsheet minimum viable product)
Define mapping table storage and access controls
Document which use cases will/won’t use this protocol

Phase 2: Process Development (Week 2)

Build preprocessing template or script
Build postprocessing template or script
Test full round-trip with non-sensitive sample data
Identify potential human error points
Create error-checking procedures
Define “what if mapping fails” contingency plan

Phase 3: Security Hardening (Week 3)

Establish mapping table backup procedures (encrypted, local)
Set access controls (who can view/edit mappings)
Document what happens if mapping table is lost
Create mapping table rotation schedule if needed
Train team on security requirements
Establish incident response for accidental real data exposure

Phase 4: Operational Deployment (Week 4)

Run pilot with single use case and small team
Monitor for process compliance
Collect feedback on friction points
Refine preprocessing/postprocessing automation
Document lessons learned
Expand to additional use cases if successful

Risk Management

Threat: Mapping table exposure

Mitigation:

Store in encrypted local files, never cloud storage
Limit access to essential personnel only
Never transmit via email or chat
Regular access audits
Automatic rotation schedule for high-risk mappings

Threat: Inconsistent mapping application

Mitigation:

Automated preprocessing scripts (remove human variance)
Validation checks before AI submission
Post-processing verification
Template-based workflows
Clear process documentation

Threat: Semantic pattern leakage

Mitigation:

Choose maximally unrelated placeholder domains
Avoid patterns that reveal industry context
Rotate placeholder domains periodically
Limit scope of any single AI interaction
Minimize relationship complexity in mapped data

Threat: AI quality insufficient for business use

Mitigation:

Test with real use cases before full deployment
Define minimum acceptable quality thresholds
Have fallback to manual process if AI quality fails
Accept that some tasks won’t work with this approach
Focus on high-volume, lower-complexity tasks first

Integration with Existing Security Stack

Homomorphic AI is not a replacement for proper enterprise security. It’s one layer in defense-in-depth.

Complementary security measures:

Encryption at rest and in transit: Standard requirement, applies to all data including mapping tables
Access controls and authentication: Who can use AI tools, who can access mappings
AI vendor selection: Choose providers with strong privacy policies, data residency guarantees
On-premise AI alternatives: For highest sensitivity work, local LLMs eliminate external data transfer
Data loss prevention (DLP): Automated scanning for accidental real data in AI prompts
Audit logging: Track all AI interactions, mapping table access, preprocessing steps
Security awareness training: Ensure team understands the protocol and why it matters

Where entity substitution fits:

When on-premise AI is too expensive/complex, but cloud AI with raw data is too risky, entity substitution provides a middle ground. Not perfect security, but meaningful risk reduction.

Performance Characteristics

Computational overhead:

Preprocessing: ~5-30 seconds for typical documents (find-replace operations)
AI processing: Same as normal (AI sees normal data volume)
Postprocessing: ~5-30 seconds for typical responses
Total added latency: ~10-60 seconds per interaction

Human overhead:

Initial mapping table creation: 2-4 hours
Mapping table maintenance: ~30 minutes/month
Per-use preprocessing (manual): 2-5 minutes
Per-use preprocessing (automated): less than 30 seconds
Per-use postprocessing: 1-3 minutes

When overhead is acceptable:

✅ High-value tasks (strategic analysis, contract negotiation)
✅ Infrequent operations (monthly financial reviews)
✅ Batch processing (process 50 contracts at once)
✅ Reusable workflows (same mapping applies repeatedly)

When overhead is prohibitive:

❌ Real-time operations (customer service chat)
❌ High-frequency, low-value tasks
❌ One-off exploratory queries
❌ Time-critical decision support

Measuring Success

Security metrics:

Zero incidents of real data in AI logs (auditable)
Mapping table access limited to authorized personnel
No mapping table exposures
100% preprocessing compliance rate
Successful security audits

Operational metrics:

AI interaction volume (using this protocol)
Time saved vs. manual alternative
AI output quality score (human evaluation)
Error rate (mapping mistakes, process failures)
Team adoption rate

Business metrics:

Tasks previously blocked now completed
Reduction in manual processing time
Increase in analysis frequency/depth
Risk reduction (quantify exposure prevented)
ROI on implementation effort

The Bottom Line

Homomorphic AI via entity substitution is not perfect security. It’s pragmatic risk reduction.

Best for: Finance teams blocked from using AI for P&L analysis. Legal teams unable to leverage AI for contract review. HR teams prevented from automating compensation analysis. Operations teams restricted from AI-assisted process optimization.

Not for: Real-time customer service. Highly creative strategy work. Tasks requiring deep industry semantics. Situations where any exposure is catastrophic.

Key insight: You care about the analytical output, not whether the AI “knows” you’re analyzing data center equipment vs. Halloween products. The mathematical relationships, logical structures, and optimization opportunities remain valid regardless of what entities you call them.

This borrows from cryptography (computation without access to plaintext) but requires zero cryptographic expertise. Spreadsheets and find-replace. That’s the implementation.

Is it bulletproof? No. Does it meaningfully reduce risk while enabling AI value? Yes.

That’s the trade-off. Enterprise AI isn’t about perfect security. It’s about acceptable risk for meaningful value.

Implementation support: Start with a single low-risk use case (monthly financial summary analysis). Build the mapping table. Test the full workflow. Measure AI output quality. If it works, expand scope. If it fails, you’ve learned cheaply.

The protocol is simple. The discipline required is not. Most failures will be operational (forgot to preprocess, mixed real and fake data), not technical. Design for human error, not just technical correctness.