Show HN: A Mutating Webhook to automatically strip PII from K8s logs
Zero-code log sanitization sidecar for Kubernetes. Prevents data leaks (GDPR/SOC2) by redacting PII from logs before they leave the pod.
"Don't let PII poison your AI models." PII-Shield ensures that sensitive data never reaches your training dataset, saving you from GDPR-forced model retraining.
Warning
Upgrading to v2.0.0? We have moved entirely to a Helm-based distribution and Distroless Native Sidecars. Kustomize deployment and /bin/sh access inside the sidecar are no longer supported. Read the Migration Guide.
Two Deployment Models
PII-Shield offers two distinct ways to integrate into your stack:
- Kubernetes Operator (Zero-code): Our flagship deployment model. A fully automated K8s Operator that injects a highly-secure Distroless Sidecar into your pods to intercept and sanitize logs on the fly.
- In-Process WASM (For core integrations): For extreme performance, the core engine can be embedded directly via WASM, providing <1ms latency without network hops.
Why PII-Shield?
Developers often forget to mask sensitive data. Traditional regex filters in Fluentd/Logstash are slow, hard to maintain, and consume expensive CPU on log aggregators.
PII-Shield sits right next to your app container:
- Production Ready: Optimized for Kubernetes sidecars with ultra-low memory allocations (zero-GC overhead on hot paths) and deterministic O(1) regex matching.
- Context-Aware Entropy Analysis: Detected high-entropy secrets even without keys (e.g. Error: ... 44saCk9...) by analyzing context keywords.
- Custom Regex Rules: Deterministic redaction for structured data (UUIDs, IDs) that overrides entropy checks, ensuring 100% compliance for known patterns.
- 100% Accuracy: Verified against "Wild" stress tests including binary garbage, JSON nesting, and multilingual logs.
- Deterministic Hashing: Replaces secrets with unique hashes (e.g., [HIDDEN:a1b2c]), allowing QA to correlate errors without seeing the raw data.
- Drop-in: No code changes required. Works with any language (Node, Python, Java, Go).
- Whitelist Support: Explicitly allow safe patterns (e.g., git hashes, system IDs) using PII_SAFE_REGEX_LIST to prevent false positives.
Managing PII-Shield across dozens of clusters?
We are building a hosted Control Plane with centralized rule management, Slack alerting, and redaction analytics.
Trusted By
GuardSpine (AI Governance Kernel) integrated PII-Shield's In-Process WASM to sanitize sensitive evidence trails directly within their Node.js and Python agents.
We chose the WASM architecture to ensure zero network overhead and <1ms latency. PII-Shield runs directly in-process, preserving the referential integrity of our hash chains while keeping logs compliant.
Performance Considerations
While PII-Shield is highly optimized, deep inspection of complex logs requires careful attention to configuration.
- Text Logs: Extremely fast (>100k lines/s).
- JSON Logs: Zero-allocation parsing (no encoding/json overhead). The scanner manually parses JSON structures to ensure high throughput (~7MB/s) without memory spikes.
- Recommendation: Usage is safe for high throughput. We use recursion safeguards to prevent stack overflows on deeply nested JSON.
Installation
Helm Chart (Kubernetes Operator)
The official and recommended way to deploy PII-Shield in Kubernetes is via our fully-automated Operator:
helm repo add pii-shield https://aragossa.github.io/pii-shield/ helm repo update helm install pii-shield-operator pii-shield/pii-shield-operator -n operator-system --create-namespaceThis deploys the PII-Shield Operator which automatically injects highly-secure, distroless sidecars into your Pods without requiring any code or Dockerfile changes.
Docker
Get the latest lightweight image from Docker Hub or GHCR:
docker pull thelisdeep/pii-shield:v2.0.0 # OR from GitHub Container Registry (Enterprise): docker pull ghcr.io/aragossa/pii-shield:v2.0.0Build from Source
You can build the binary directly from the source code:
go build -o pii-shield ./cmd/cleaner/main.goConfiguration
See CONFIGURATION.md for a full list of environment variables, including:
- PII_SALT: Custom HMAC salt (Required for production).
- PII_ADAPTIVE_THRESHOLD: Enable dynamic entropy baselines.
- PII_DISABLE_BIGRAM_CHECK: Optimize for non-English logs.
- PII_CUSTOM_REGEX_LIST: Custom regex rules for deterministic redaction.
- PII_SAFE_REGEX_LIST: Whitelist regex rules to ignore (matches are returned as-is).
Entropy Sensitivity Table (Default Threshold: 3.6)
| Entropy | Data Type | Example |
|---|---|---|
| 0.0 - 3.0 | Common words, repeats | password, admin, 111111 |
| 3.0 - 3.6 | CamelCase, partial hashes | ProgramCampaignInstanceJob, 8f3a11b2c |
| 3.6 - 4.5 | Paths, UUIDs, Weak Passwords | /opt/application/runtime, P@ssw0rd2026! |
| 4.5 - 5.0 | Medium Tokens | E8s9d_2kL1 |
| 5.0+ | High Entropy Keys | (SHA-256, API Keys) |
Quick Start
- Test Locally (CLI) You can pipe any log output through PII-Shield to see it in action immediately:
- Kubernetes (Automated Sidecar Injection) With the PII-Shield Operator installed, protecting an application is as simple as creating a PiiPolicy and labeling your Pods.
Create a Policy:
apiVersion: core.pii-shield.io/v1alpha1 kind: PiiPolicy metadata: name: strict-policy namespace: default spec: injectionMode: "file"Label your Deployment:
apiVersion: apps/v1 kind: Deployment metadata: name: secure-app spec: template: metadata: labels: pii-shield.io/inject: "true" annotations: pii-shield.io/policy: "strict-policy" # ...The Operator will automatically inject the pii-shield-agent using the Native Sidecar pattern (K8s 1.28+) and securely mask all logs!
Verification
This project is verified with a comprehensive testing suite, ensuring production-readiness for v2.0.0:
- Unit Tests: Cover edge cases, multilingual support, and JSON integrity with >85% coverage.
- Fuzzing: Native Go fuzzing ensures crash safety against invalid and random binary inputs.
- Smoke Testing: ./run_smoke.sh validates 100% detection accuracy on mixed workloads.
- End-to-End (E2E) Testing: The operator/tests/run_e2e.sh suite performs full-stack validation using Minikube and Helm. It builds local images, provisions the Operator without cert-manager, deploys target Jobs, and verifies actual log redaction by intercepting sidecar outputs.
License
Distributed under the Apache 2.0 License. See LICENSE for more information.
Схожі новини
Новая ИИ-модель по умолчанию. OpenAI выпустила GPT-5.5 Instant с новыми функциями