Ops Command Center v3.2.1
AIA-HA-2026 Ready
Created Jan 6, 2026

Homomorphic AI: Protecting Sensitive Data in LLM Workflows

Entity substitution protocol for using AI with confidential business data. Inspired by homomorphic encryption, practical for operations.

Implementation
General
Joshua Schultz
-
ChatGPTClaudeGemini
Tags:
#security #data-protection #implementation #llm #privacy
Article Content

Most enterprise AI policies are binary: either block AI entirely for sensitive work, or accept the risk of data exposure. This creates a productivity tax on high-value tasks that would benefit most from AI assistance.

There’s a third option: entity substitution protocol. Borrowed from homomorphic encryption principles, adapted for practical business use. Send fake data to AI, receive processed output, map results back to real data. The AI never sees your actual information.

The Core Concept

Homomorphic encryption allows computation on encrypted data without decryption. You send scrambled inputs, receive scrambled outputs, decrypt locally. The server never sees plaintext.

We apply the same principle to LLM workflows using simple mapping tables:

  1. Preprocess: Replace sensitive entities with unrelated placeholders
  2. Process: Send placeholder data to AI for analysis/generation
  3. Postprocess: Map AI outputs back to real entities

Example: Contract pricing negotiation.

Real data (confidential):

  • Liquid cooling systems: 45% discount
  • Edge computing hardware: 38% discount
  • Power distribution units: 42% discount

Mapped data (sent to AI):

  • Halloween costumes: 12% discount
  • Candy corn: 8% discount
  • Pumpkin decorations: 15% discount

The AI analyzes “Halloween product pricing.” You map results back to real products and discounts. Your actual pricing never leaves your infrastructure.

Implementation Architecture

Mapping Table Structure

The critical component is your local mapping table. This stays on-premise, never in cloud storage, never in email.

Minimum viable structure:

Entity TypeReal ValuePlaceholder ValueNotes
Product CategoryLiquid cooling systemsHalloween costumesMaintain relative complexity
Discount45%12%Preserve ratio relationships
CustomerAcme CorpClient AlphaAnonymize identifiers
SKULCS-2024-XHC-001Keep length consistent

Critical design rules:

  1. Preserve structural relationships: If two real entities have similar characteristics, their placeholders should too. Don’t map complex products to simple placeholders.

  2. Maintain scale: If real discounts range 10-50%, placeholders should span similar ranges. Ratio preservation matters for AI analysis quality.

  3. Consistent cardinality: If you have 12 product categories, create 12 placeholder categories. The AI needs realistic data structure.

  4. Domain separation: Choose placeholder domains unrelated to your industry. Enterprise software company? Use food service terms. Manufacturing? Use entertainment categories.

Preprocessing Pipeline

Before sending data to AI:

# Pseudocode - actual implementation varies by use case
input_data = load_contract_draft()
mapping_table = load_mapping_table_secure()

processed_data = input_data
for real_value, placeholder in mapping_table.items():
    processed_data = processed_data.replace(real_value, placeholder)

send_to_ai(processed_data)

Automation opportunities:

  • Excel/Google Sheets find-replace macros
  • Python scripts with pandas for bulk transformations
  • Custom CLI tools for repeated workflows
  • Pre-processing templates for common document types

The key: make preprocessing deterministic and repeatable. Manual find-replace introduces error risk.

Postprocessing Pipeline

When AI returns results:

# Pseudocode - reverse mapping
ai_output = receive_from_ai()
mapping_table = load_mapping_table_secure()

real_output = ai_output
for placeholder, real_value in mapping_table.items():
    real_output = real_output.replace(placeholder, real_value)

save_to_secure_location(real_output)

Same principle as preprocessing, reverse direction. Deterministic transformation ensures consistency.

Use Cases by Domain

Financial Analysis

Scenario: Using AI to analyze P&L trends, identify cost reduction opportunities, or model scenarios.

Sensitive data: Actual revenue figures, vendor names, cost centers.

Mapping approach:

  • Scale revenue by a constant factor (divide by 1000, multiply by 1.37, etc.)
  • Replace vendor names with generic suppliers (“Vendor-001”, “Vendor-002”)
  • Map cost centers to unrelated departments

Example transformation:

Real: “Q3 2024 revenue from Acme distribution contract: $2.4M, COGS: $1.8M”

Mapped: “Q3 2019 revenue from Contract-Alpha: $3,288, COGS: $2,466”

AI analyzes margin trends, identifies optimization opportunities. You map recommendations back to real vendors and actual numbers.

Quality consideration: Preserve ratios and relationships. If Vendor A is 3x larger than Vendor B in reality, maintain that ratio in placeholder data.

Contract Review and Drafting

Scenario: Using AI to review contract terms, suggest improvements, identify risks.

Sensitive data: Company names, specific terms, pricing structures, proprietary clauses.

Mapping approach:

  • Replace all party names with “Party A”, “Party B”, “Vendor-X”
  • Map specific product terms to generic equivalents
  • Translate pricing into scaled figures
  • Substitute proprietary language with industry-standard clauses

Example transformation:

Real: “Acme Corp agrees to purchase minimum 1,000 units of Model XZ-5000 at $4,250/unit with 45% discount for volumes exceeding 2,500 units annually”

Mapped: “Party A agrees to purchase minimum 100 units of Product-001 at $425/unit with 12% discount for volumes exceeding 250 units annually”

The AI reviews structure, flags unusual terms, suggests improvements. Map results back to real terms.

Risk mitigation: Even if AI output leaks, it contains no actual business terms. Your intellectual property remains protected.

Customer and Vendor Intelligence

Scenario: Analyzing customer behavior patterns, vendor performance, or relationship dynamics.

Sensitive data: Customer names, contact information, transaction history, relationship notes.

Mapping approach:

  • Assign each customer/vendor a persistent pseudonym (“Customer-A”, “Vendor-001”)
  • Maintain pseudonym consistency across all analyses
  • Map specific product/service names to categories
  • Scale transaction values

Example transformation:

Real: “TechCorp (customer since 2019) purchased 15 software licenses in Q1, 23 in Q2, 8 in Q3. Primary contact: John Smith (john@techcorp.com). Notes: Budget approval delays typical in Q4.”

Mapped: “Customer-A (relationship since 2019) purchased 15 units in Q1, 23 in Q2, 8 in Q3. Primary contact: Contact-001. Notes: Budget approval delays typical in Q4.”

AI identifies patterns (seasonal purchasing, decision cycle timing, growth trajectory). You apply insights to real customer relationships.

Critical rule: Maintain pseudonym consistency. If “Customer-A” appears in multiple analyses, it must map to the same real customer every time.

HR and Compensation Analysis

Scenario: Using AI to analyze compensation equity, suggest org structures, or draft job descriptions.

Sensitive data: Employee names, salaries, performance ratings, PII.

Mapping approach:

  • Replace names with “Employee-001”, “Employee-002”
  • Convert salaries to percentages of median or scale by constant
  • Map job titles to generic equivalents
  • Anonymize department names

Example transformation:

Real: “Sarah Johnson, Senior Data Engineer, Salary: $145,000, Performance: Exceeds Expectations, Team: Machine Learning Infrastructure”

Mapped: “Employee-042, IC-Level-3, Compensation: 128% of baseline, Performance: Rating-4, Team: Engineering-Group-C”

AI analyzes compensation equity across levels, suggests adjustments, identifies outliers. You apply recommendations to actual compensation planning.

Compliance note: This approach helps maintain GDPR/CCPA compliance by ensuring PII never reaches third-party AI services.

Security Properties and Limitations

What This Protects Against

AI vendor logging: Even if the AI service logs every prompt and response, your logs contain only placeholder data. No business intelligence exposed.

Model training contamination: If prompts are used to train future models, the model learns nothing about your actual business.

Data breach at AI provider: If the AI company’s infrastructure is compromised, attackers obtain meaningless placeholders.

Internal data exfiltration: Employees with AI access but not mapping table access cannot extract real business data through AI interactions.

Regulatory compliance gaps: For frameworks requiring on-premise data handling, this technique keeps sensitive data local while leveraging cloud AI capabilities.

What This Does NOT Protect Against

Structural inference attacks: If placeholder data maintains real-world patterns (necessary for AI quality), sophisticated analysis might reverse-engineer the mapping. Example: If “Halloween costume” consistently appears in contexts suggesting data center equipment, an industry expert might deduce the real category.

Volume-based correlation: Frequency patterns can leak information. If you send 10,000 queries about “Customer-A”, observers know you have at least one very large customer.

Mapping table compromise: If your mapping table is exposed (email, unsecured file share, laptop theft), all historical AI interactions are retroactively compromised. The mapping table is your most critical security asset.

AI quality degradation: The AI cannot apply domain-specific knowledge. It won’t suggest “liquid cooling systems typically command higher margins” when it thinks you’re selling Halloween costumes. You trade some AI capability for data protection.

Human operational errors: The biggest risk. Forgetting to preprocess data, mixing real and placeholder data, accidentally sharing the mapping table, or inconsistently applying mappings breaks the entire system.

Semantic leakage in relationships: Complex relationship patterns in data can reveal structure even when entities are masked. Be cautious with highly interconnected data.

Implementation Decision Framework

When to use this approach:

✅ Structured data analysis (tables, lists, categories) ✅ Document drafting with fill-in-the-blank sections
✅ Organizing and categorizing items
✅ Logic and consistency checking
✅ Mathematical or statistical operations
✅ Format conversion and data transformation
✅ Pattern detection in anonymizable data

When NOT to use this approach:

❌ Creative work requiring industry context
❌ Strategy development needing market knowledge
❌ Any task where semantic domain meaning is critical
❌ Very small datasets (patterns too obvious)
❌ Highly interconnected relational data
❌ Situations where a single error exposes everything
❌ Work requiring the AI to “understand” your specific business

Implementation Checklist

Phase 1: Scoping and Mapping (Week 1)

  • Identify specific use case for entity substitution
  • List all sensitive entity types requiring mapping
  • Design placeholder schema maintaining structural properties
  • Create initial mapping table (spreadsheet minimum viable product)
  • Define mapping table storage and access controls
  • Document which use cases will/won’t use this protocol

Phase 2: Process Development (Week 2)

  • Build preprocessing template or script
  • Build postprocessing template or script
  • Test full round-trip with non-sensitive sample data
  • Identify potential human error points
  • Create error-checking procedures
  • Define “what if mapping fails” contingency plan

Phase 3: Security Hardening (Week 3)

  • Establish mapping table backup procedures (encrypted, local)
  • Set access controls (who can view/edit mappings)
  • Document what happens if mapping table is lost
  • Create mapping table rotation schedule if needed
  • Train team on security requirements
  • Establish incident response for accidental real data exposure

Phase 4: Operational Deployment (Week 4)

  • Run pilot with single use case and small team
  • Monitor for process compliance
  • Collect feedback on friction points
  • Refine preprocessing/postprocessing automation
  • Document lessons learned
  • Expand to additional use cases if successful

Risk Management

Threat: Mapping table exposure

Mitigation:

  • Store in encrypted local files, never cloud storage
  • Limit access to essential personnel only
  • Never transmit via email or chat
  • Regular access audits
  • Automatic rotation schedule for high-risk mappings

Threat: Inconsistent mapping application

Mitigation:

  • Automated preprocessing scripts (remove human variance)
  • Validation checks before AI submission
  • Post-processing verification
  • Template-based workflows
  • Clear process documentation

Threat: Semantic pattern leakage

Mitigation:

  • Choose maximally unrelated placeholder domains
  • Avoid patterns that reveal industry context
  • Rotate placeholder domains periodically
  • Limit scope of any single AI interaction
  • Minimize relationship complexity in mapped data

Threat: AI quality insufficient for business use

Mitigation:

  • Test with real use cases before full deployment
  • Define minimum acceptable quality thresholds
  • Have fallback to manual process if AI quality fails
  • Accept that some tasks won’t work with this approach
  • Focus on high-volume, lower-complexity tasks first

Integration with Existing Security Stack

Homomorphic AI is not a replacement for proper enterprise security. It’s one layer in defense-in-depth.

Complementary security measures:

  • Encryption at rest and in transit: Standard requirement, applies to all data including mapping tables
  • Access controls and authentication: Who can use AI tools, who can access mappings
  • AI vendor selection: Choose providers with strong privacy policies, data residency guarantees
  • On-premise AI alternatives: For highest sensitivity work, local LLMs eliminate external data transfer
  • Data loss prevention (DLP): Automated scanning for accidental real data in AI prompts
  • Audit logging: Track all AI interactions, mapping table access, preprocessing steps
  • Security awareness training: Ensure team understands the protocol and why it matters

Where entity substitution fits:

When on-premise AI is too expensive/complex, but cloud AI with raw data is too risky, entity substitution provides a middle ground. Not perfect security, but meaningful risk reduction.

Performance Characteristics

Computational overhead:

  • Preprocessing: ~5-30 seconds for typical documents (find-replace operations)
  • AI processing: Same as normal (AI sees normal data volume)
  • Postprocessing: ~5-30 seconds for typical responses
  • Total added latency: ~10-60 seconds per interaction

Human overhead:

  • Initial mapping table creation: 2-4 hours
  • Mapping table maintenance: ~30 minutes/month
  • Per-use preprocessing (manual): 2-5 minutes
  • Per-use preprocessing (automated): less than 30 seconds
  • Per-use postprocessing: 1-3 minutes

When overhead is acceptable:

✅ High-value tasks (strategic analysis, contract negotiation)
✅ Infrequent operations (monthly financial reviews)
✅ Batch processing (process 50 contracts at once)
✅ Reusable workflows (same mapping applies repeatedly)

When overhead is prohibitive:

❌ Real-time operations (customer service chat)
❌ High-frequency, low-value tasks
❌ One-off exploratory queries
❌ Time-critical decision support

Measuring Success

Security metrics:

  • Zero incidents of real data in AI logs (auditable)
  • Mapping table access limited to authorized personnel
  • No mapping table exposures
  • 100% preprocessing compliance rate
  • Successful security audits

Operational metrics:

  • AI interaction volume (using this protocol)
  • Time saved vs. manual alternative
  • AI output quality score (human evaluation)
  • Error rate (mapping mistakes, process failures)
  • Team adoption rate

Business metrics:

  • Tasks previously blocked now completed
  • Reduction in manual processing time
  • Increase in analysis frequency/depth
  • Risk reduction (quantify exposure prevented)
  • ROI on implementation effort

The Bottom Line

Homomorphic AI via entity substitution is not perfect security. It’s pragmatic risk reduction.

Best for: Finance teams blocked from using AI for P&L analysis. Legal teams unable to leverage AI for contract review. HR teams prevented from automating compensation analysis. Operations teams restricted from AI-assisted process optimization.

Not for: Real-time customer service. Highly creative strategy work. Tasks requiring deep industry semantics. Situations where any exposure is catastrophic.

Key insight: You care about the analytical output, not whether the AI “knows” you’re analyzing data center equipment vs. Halloween products. The mathematical relationships, logical structures, and optimization opportunities remain valid regardless of what entities you call them.

This borrows from cryptography (computation without access to plaintext) but requires zero cryptographic expertise. Spreadsheets and find-replace. That’s the implementation.

Is it bulletproof? No. Does it meaningfully reduce risk while enabling AI value? Yes.

That’s the trade-off. Enterprise AI isn’t about perfect security. It’s about acceptable risk for meaningful value.


Implementation support: Start with a single low-risk use case (monthly financial summary analysis). Build the mapping table. Test the full workflow. Measure AI output quality. If it works, expand scope. If it fails, you’ve learned cheaply.

The protocol is simple. The discipline required is not. Most failures will be operational (forgot to preprocess, mixed real and fake data), not technical. Design for human error, not just technical correctness.

Back to AI Articles
Submit Work Order