PII Handling & Data Privacy
PII hashed, masked, or stripped before vendors ever see it
Every customer field is classified once at the Signal Core. Email and phone are SHA-256 hashed for vendor matching. IP addresses are masked. Payment card numbers, dates of birth, and home addresses are stripped entirely. The same rules apply to every vendor pipeline. Raw PII never leaves your cloud, and the decisions are auditable per event.
0 bytes
Raw PII to vendors
Every classified field is transformed before any vendor delivery
One layer
Org-wide policy
Classifications live above pipelines so every vendor honours the same rules
Per event
Decision audit
Which field was hashed, masked, or stripped, recorded for every event
Customer cloud
Where it runs
Datafly never processes the raw values. Hashing happens in your VPC
Why PII handling at the vendor level is the wrong layer
Most teams hash PII inside each vendor integration: a Meta CAPI connector hashes email for Meta, a Google Ads connector hashes email for Google, the warehouse loader maybe doesn't hash at all. This works until it doesn't.
Per-vendor implementations drift
Different teams configure different vendors. Meta hashes email, Google hashes email and phone, the CDP hashes nothing because "it's internal". Six months later, a new TikTok integration ships without normalisation and downgrades match quality. Eighteen months later, a procurement audit finds the warehouse contains plaintext customer email because nobody told the warehouse loader to hash.
Hash format mismatches break match rates
Meta CAPI requires SHA-256 of trimmed-lowercased email. Google Customer Match requires SHA-256 of trimmed-lowercased email after specific normalisation rules for + and dots in Gmail addresses. TikTok wants SHA-256 of the raw lowercase string. Different teams misimplementing "hash the email" produce hashes that don't match the vendor's identity graph, and match rates plateau at 30-50% instead of 80%+.
No audit trail of the decision
When a regulator, a DPO, or an internal security review asks "was email PII ever sent in plaintext to vendor X?", the answer should be a queryable audit row, not an examination of every connector's source code. With per-vendor hashing scattered across the stack, that audit is genuinely hard to produce.
How the Org Data Layer enforces PII rules
Classify each customer field
The Org Data Layer holds an org-wide map of customer fields to PII classifications: email, phone, first_name, last_name, ip_address, user_id, payment_card, date_of_birth, home_address, and any custom classification you define. Classifications are version-controlled and auditable.
Apply transforms at the Signal Core
Every event arriving at the Ingestion Gateway is enriched with classifications and runs through the transform layer before any pipeline delivery. SHA-256 hashes are computed once, in the customer's cloud, never sent in plaintext anywhere.
Vendor-specific normalisation rules
Each vendor has known hash format expectations. The Org Data Layer encodes them: Google Customer Match's Gmail-specific normalisation, Meta CAPI's lowercase-then-hash, TikTok's raw-lowercase-hash, LinkedIn's SHA-256 with no normalisation. Match rates max out because the format matches what the vendor actually accepts.
Per-vendor delivery applies the rules
When the Delivery Worker picks an event off the queue for a specific vendor, the right hash format is selected automatically. No connector-specific code is needed in the integration template, because the policy is one layer up.
Audit log records the decision, not the value
Every transform is recorded in the audit log: event ID, field name, classification matched, action taken (hash / mask / strip), vendor target. The raw value is never logged. Regulator-defensible by construction.
PII rules in one place
Below is an org-wide PII configuration. Every pipeline inherits these rules. Per-vendor overrides handle the cases where a destination needs a slightly different hash format.
# pii-handling.yaml
# Org-wide PII rules applied at the Signal Core BEFORE any
# pipeline transformation or vendor delivery. No raw PII
# leaves the customer cloud at any point.
org_data_layer:
classifications:
email:
action: hash_sha256_normalised
# Trim, lowercase, then SHA-256.
# Output format vendors expect for matching.
phone:
action: hash_sha256_e164
# Normalise to E.164, then SHA-256.
first_name:
action: hash_sha256_normalised
last_name:
action: hash_sha256_normalised
ip_address:
action: mask
# Strip last octet for IPv4, last 80 bits for IPv6.
# Preserves geo-coarse signal without identity.
user_id:
action: hash_sha256
# Used as match key for CDP and warehouse joins.
# Hashed once, deterministic across all vendors.
payment_card:
action: strip
# Never delivered to any vendor. Period.
date_of_birth:
action: strip
home_address:
action: strip
# Per-vendor overrides where the destination has its own
# requirements (e.g. Customer Match needs a slightly
# different hash format).
vendor_overrides:
google_customer_match:
email: hash_sha256_normalised_google
meta_capi:
external_id: hash_sha256
# Audit retention for the masking decisions themselves
# (which fields were hashed, which stripped) without
# ever logging the raw values.
audit_log:
retain_decisions: true
retain_raw_values: falseBuilt for procurement and DPO sign-off
Every feature exists so that a security review, a DPO audit, or a SOC 2 examiner gets a clean answer in one query.
SHA-256 with vendor-correct normalisation
Email, phone, and name fields hashed in the format each vendor actually expects. Maximises match rates without requiring per-team integration knowledge.
IP masking by default
IPv4 last octet stripped, IPv6 last 80 bits stripped. Preserves coarse geo signal without storing the full identifying address.
Strip on classify, no exceptions
Payment card numbers, dates of birth, and home addresses are stripped at ingestion. They never reach a pipeline transform, never reach a vendor, never reach the warehouse.
Customer-cloud only
All hashing and masking runs inside the customer's own VPC. Datafly Signal as a vendor never receives the raw values. Procurement-friendly by design.
Per-event decision audit
Audit log records which fields were hashed, masked, or stripped per event. Searchable by event ID, user ID, or classification. Retained per your audit retention policy.
Versioned policy changes
Classifications and transforms are version-controlled. Every change has an author, a timestamp, and a diff. Roll forward or back without redeploying any service.
Where PII transforms run, by setup
The layer at which PII is transformed determines whether the policy stays consistent across vendors and whether the audit trail is complete.
| Setup | Where PII is hashed | Per-event audit |
|---|---|---|
| gtag.js client-side | In the browser if at all | No |
| GTM Server-Side | Per template, varies by tag | Limited |
| Per-vendor connectors (custom) | Inside each integration, divergent | Per integration |
| Datafly Signal Org Data Layer | Once at the Signal Core, before any pipeline | Per event, per field, per vendor |
Frequently asked questions
- What PII does Datafly Signal handle?
- The default Org Data Layer recognises email, phone, first_name, last_name, ip_address, user_id, payment_card, date_of_birth, and home_address. Each maps to an action: SHA-256 hashing for matchable fields (email, phone, names), masking for IP addresses (last octet stripped on IPv4, last 80 bits on IPv6), and full strip for fields that should never reach a vendor (payment cards, DOB, home address). You can add custom classifications for industry-specific data.
- How is the PII transformed for Meta CAPI vs Google Ads?
- Each vendor has known hash format expectations. Meta CAPI requires SHA-256 of trimmed-lowercased email. Google Customer Match requires SHA-256 of trimmed-lowercased email after specific Gmail-address normalisation rules. TikTok wants SHA-256 of the raw lowercase string. The Org Data Layer encodes each vendor's exact format so match rates max out automatically — without each integration team needing to know the per-vendor rules.
- Does Datafly Signal as a vendor see raw PII?
- No. The platform is deployed as single-tenant Kubernetes inside the customer's own AWS, GCP, or Azure account. All PII transformation happens in the customer's VPC. Datafly the company never receives the raw values, never receives the hashed values, never receives the events. Procurement-friendly by design.
- Can I customise classifications for industry-specific data?
- Yes. Beyond the defaults, you define custom classifications: e.g. policy_number for insurance, mrn for healthcare, account_id for banking. Each classification has its own action (hash, mask, strip, or custom transform). Per-vendor overrides handle cases where a destination needs a different format than the default for that classification.
- Are PII transform decisions auditable per event?
- Yes. The audit log records which fields were hashed, masked, or stripped per event, the classification matched, the vendor target, and the rule that produced the decision. The raw values are never logged, only the decisions. Searchable by event ID, user ID, classification, or vendor. Retained per your audit retention policy.
- How does this compare to GTM Server-Side?
- In GTM Server-Side, PII handling lives inside individual server templates. Each template handles its own hashing, with whatever format the template author implemented. Different templates from different vendors handle the same email field differently. Datafly Signal puts PII rules above the pipeline layer so every vendor honours the same policy, with vendor-specific format rules selected automatically. Audit and consistency win.
Related
Consent Mode v2
The other half of compliance: server-side consent gating with per-event audit.
Meta Conversions API
PII matching for Meta CAPI EMQ optimisation, with the right hash format applied automatically.
Google Ads
Enhanced Conversions matching with vendor-correct PII normalisation.
Trust
Datafly's certifications, security posture, and procurement-ready documentation.
Make PII handling auditable in one place
Request a technical walkthrough. We'll review your current PII flow across vendors, show where the inconsistencies are, and demonstrate how a single Org Data Layer config replaces them.