PII Handling & Data Privacy

PII hashed, masked, or stripped before vendors ever see it

Every customer field is classified once at the Signal Core. Email and phone are SHA-256 hashed for vendor matching. IP addresses are masked. Payment card numbers, dates of birth, and home addresses are stripped entirely. The same rules apply to every vendor pipeline. Raw PII never leaves your cloud, and the decisions are auditable per event.

0 bytes

Raw PII to vendors

Every classified field is transformed before any vendor delivery

One layer

Org-wide policy

Classifications live above pipelines so every vendor honours the same rules

Per event

Decision audit

Which field was hashed, masked, or stripped, recorded for every event

Customer cloud

Where it runs

Datafly never processes the raw values. Hashing happens in your VPC

Why PII handling at the vendor level is the wrong layer

Most teams hash PII inside each vendor integration: a Meta CAPI connector hashes email for Meta, a Google Ads connector hashes email for Google, the warehouse loader maybe doesn't hash at all. This works until it doesn't.

Per-vendor implementations drift

Different teams configure different vendors. Meta hashes email, Google hashes email and phone, the CDP hashes nothing because "it's internal". Six months later, a new TikTok integration ships without normalisation and downgrades match quality. Eighteen months later, a procurement audit finds the warehouse contains plaintext customer email because nobody told the warehouse loader to hash.

Hash format mismatches break match rates

Meta CAPI requires SHA-256 of trimmed-lowercased email. Google Customer Match requires SHA-256 of trimmed-lowercased email after specific normalisation rules for + and dots in Gmail addresses. TikTok wants SHA-256 of the raw lowercase string. Different teams misimplementing "hash the email" produce hashes that don't match the vendor's identity graph, and match rates plateau at 30-50% instead of 80%+.

No audit trail of the decision

When a regulator, a DPO, or an internal security review asks "was email PII ever sent in plaintext to vendor X?", the answer should be a queryable audit row, not an examination of every connector's source code. With per-vendor hashing scattered across the stack, that audit is genuinely hard to produce.

How the Org Data Layer enforces PII rules

1

Classify each customer field

The Org Data Layer holds an org-wide map of customer fields to PII classifications: email, phone, first_name, last_name, ip_address, user_id, payment_card, date_of_birth, home_address, and any custom classification you define. Classifications are version-controlled and auditable.

2

Apply transforms at the Signal Core

Every event arriving at the Ingestion Gateway is enriched with classifications and runs through the transform layer before any pipeline delivery. SHA-256 hashes are computed once, in the customer's cloud, never sent in plaintext anywhere.

3

Vendor-specific normalisation rules

Each vendor has known hash format expectations. The Org Data Layer encodes them: Google Customer Match's Gmail-specific normalisation, Meta CAPI's lowercase-then-hash, TikTok's raw-lowercase-hash, LinkedIn's SHA-256 with no normalisation. Match rates max out because the format matches what the vendor actually accepts.

4

Per-vendor delivery applies the rules

When the Delivery Worker picks an event off the queue for a specific vendor, the right hash format is selected automatically. No connector-specific code is needed in the integration template, because the policy is one layer up.

5

Audit log records the decision, not the value

Every transform is recorded in the audit log: event ID, field name, classification matched, action taken (hash / mask / strip), vendor target. The raw value is never logged. Regulator-defensible by construction.

PII rules in one place

Below is an org-wide PII configuration. Every pipeline inherits these rules. Per-vendor overrides handle the cases where a destination needs a slightly different hash format.

pii-handling.yamlYAML
# pii-handling.yaml
# Org-wide PII rules applied at the Signal Core BEFORE any
# pipeline transformation or vendor delivery. No raw PII
# leaves the customer cloud at any point.

org_data_layer:
  classifications:
    email:
      action: hash_sha256_normalised
      # Trim, lowercase, then SHA-256.
      # Output format vendors expect for matching.

    phone:
      action: hash_sha256_e164
      # Normalise to E.164, then SHA-256.

    first_name:
      action: hash_sha256_normalised

    last_name:
      action: hash_sha256_normalised

    ip_address:
      action: mask
      # Strip last octet for IPv4, last 80 bits for IPv6.
      # Preserves geo-coarse signal without identity.

    user_id:
      action: hash_sha256
      # Used as match key for CDP and warehouse joins.
      # Hashed once, deterministic across all vendors.

    payment_card:
      action: strip
      # Never delivered to any vendor. Period.

    date_of_birth:
      action: strip

    home_address:
      action: strip

  # Per-vendor overrides where the destination has its own
  # requirements (e.g. Customer Match needs a slightly
  # different hash format).
  vendor_overrides:
    google_customer_match:
      email: hash_sha256_normalised_google
    meta_capi:
      external_id: hash_sha256

  # Audit retention for the masking decisions themselves
  # (which fields were hashed, which stripped) without
  # ever logging the raw values.
  audit_log:
    retain_decisions: true
    retain_raw_values: false

Built for procurement and DPO sign-off

Every feature exists so that a security review, a DPO audit, or a SOC 2 examiner gets a clean answer in one query.

SHA-256 with vendor-correct normalisation

Email, phone, and name fields hashed in the format each vendor actually expects. Maximises match rates without requiring per-team integration knowledge.

IP masking by default

IPv4 last octet stripped, IPv6 last 80 bits stripped. Preserves coarse geo signal without storing the full identifying address.

Strip on classify, no exceptions

Payment card numbers, dates of birth, and home addresses are stripped at ingestion. They never reach a pipeline transform, never reach a vendor, never reach the warehouse.

Customer-cloud only

All hashing and masking runs inside the customer's own VPC. Datafly Signal as a vendor never receives the raw values. Procurement-friendly by design.

Per-event decision audit

Audit log records which fields were hashed, masked, or stripped per event. Searchable by event ID, user ID, or classification. Retained per your audit retention policy.

Versioned policy changes

Classifications and transforms are version-controlled. Every change has an author, a timestamp, and a diff. Roll forward or back without redeploying any service.

Where PII transforms run, by setup

The layer at which PII is transformed determines whether the policy stays consistent across vendors and whether the audit trail is complete.

SetupWhere PII is hashedPer-event audit
gtag.js client-sideIn the browser if at allNo
GTM Server-SidePer template, varies by tagLimited
Per-vendor connectors (custom)Inside each integration, divergentPer integration
Datafly Signal Org Data LayerOnce at the Signal Core, before any pipelinePer event, per field, per vendor

Frequently asked questions

What PII does Datafly Signal handle?
The default Org Data Layer recognises email, phone, first_name, last_name, ip_address, user_id, payment_card, date_of_birth, and home_address. Each maps to an action: SHA-256 hashing for matchable fields (email, phone, names), masking for IP addresses (last octet stripped on IPv4, last 80 bits on IPv6), and full strip for fields that should never reach a vendor (payment cards, DOB, home address). You can add custom classifications for industry-specific data.
How is the PII transformed for Meta CAPI vs Google Ads?
Each vendor has known hash format expectations. Meta CAPI requires SHA-256 of trimmed-lowercased email. Google Customer Match requires SHA-256 of trimmed-lowercased email after specific Gmail-address normalisation rules. TikTok wants SHA-256 of the raw lowercase string. The Org Data Layer encodes each vendor's exact format so match rates max out automatically — without each integration team needing to know the per-vendor rules.
Does Datafly Signal as a vendor see raw PII?
No. The platform is deployed as single-tenant Kubernetes inside the customer's own AWS, GCP, or Azure account. All PII transformation happens in the customer's VPC. Datafly the company never receives the raw values, never receives the hashed values, never receives the events. Procurement-friendly by design.
Can I customise classifications for industry-specific data?
Yes. Beyond the defaults, you define custom classifications: e.g. policy_number for insurance, mrn for healthcare, account_id for banking. Each classification has its own action (hash, mask, strip, or custom transform). Per-vendor overrides handle cases where a destination needs a different format than the default for that classification.
Are PII transform decisions auditable per event?
Yes. The audit log records which fields were hashed, masked, or stripped per event, the classification matched, the vendor target, and the rule that produced the decision. The raw values are never logged, only the decisions. Searchable by event ID, user ID, classification, or vendor. Retained per your audit retention policy.
How does this compare to GTM Server-Side?
In GTM Server-Side, PII handling lives inside individual server templates. Each template handles its own hashing, with whatever format the template author implemented. Different templates from different vendors handle the same email field differently. Datafly Signal puts PII rules above the pipeline layer so every vendor honours the same policy, with vendor-specific format rules selected automatically. Audit and consistency win.

Make PII handling auditable in one place

Request a technical walkthrough. We'll review your current PII flow across vendors, show where the inconsistencies are, and demonstrate how a single Org Data Layer config replaces them.