Platform Architecture

The Datafly Signal platform

Eight independently deployable components that capture, govern, and deliver first-party data from browser to vendor API — server-side, inside your own VPC. Capture. Command. Connect.

See your data flow in real time

A single request path from browser to vendor API. Every step is observable, every transformation is auditable. This is what your dashboard looks like.

BrowserEnd user
Datafly.js4.2KB collector
Your SubdomainYour domain
Collection EndpointValidation & enrichment
Event StreamScalable message bus
Processing EngineData governance
Transformation EnginePer-vendor transforms
Delivery QueuePer-integration routing
Delivery LayerGuaranteed delivery
Vendor APIsGA4, Meta, TikTok, ...

Everything you need. Nothing you don't.

Every component is purpose-built for first-party data — from collection to delivery. Scale each independently. Update without downtime. Deploy wherever your data must live.

4.2KB Collector

The lightest collector in the industry. Replaces every vendor script on your page — ad blocker-proof, first-party, zero-dependency.

First-Party Collection

Your subdomain. Your cookies. Server-set 400-day identifiers that Safari's ITP cannot touch — 57x longer than browser defaults.

Governance + Transform

Company-wide data quality rules run before per-vendor shaping. Every event validated, enriched, and PII-protected — automatically.

Guaranteed Delivery

Up to five retries over 24 hours. Intelligent rate limiting. Zero data loss — failed events are queued, not dropped.

Identity Resolution

30+ vendor IDs stored persistently and enriched on every event. Cross-domain token handoff that works in every browser, including Safari.

Full Management API

Every configuration, integration, and delivery rule is API-first. Granular permissions, complete audit trail, enterprise SSO.

Real-Time Visibility

Live event stream from collection to delivery. Inspect every stage — input, governance output, vendor payload, vendor response.

Enterprise Data Layer

High-throughput event streaming for millions of events per day. Persistent identity and configuration storage with automatic failover.

Built for enterprise throughput

Datafly Signal is an event streaming platform, not an HTTP proxy. Every component is designed to handle millions of events per day without dropping a single one.

<50ms

End-to-end delivery

From browser event to vendor API response, measured across all active integrations simultaneously.

400 days

Identity persistence

30+ vendor IDs stored persistently and re-enriched on every event — no lookups lost to cache eviction.

Retry guarantee

Up to five delivery attempts with exponential backoff over 24 hours. Failed events are queued, not discarded.

100%

Data completeness

No sampling. No ad blocker gaps. No attribution lost to browser restrictions. Every event arrives complete.

Enterprise-grade event streaming at the core

High-throughput message streaming decouples collection from delivery — your collection endpoint never blocks waiting for a slow vendor API.

High throughputAt-least-once deliveryOrdered per userReplay on demand

Two layers. Total control.

No other platform separates company-wide governance from per-vendor transformation. This distinction is what lets you enforce data quality once and deliver perfectly formatted data to every destination — without writing per-integration rules for every compliance requirement.

1

Data Governance Layer

Global governance that applies to every event, regardless of destination. Runs before any vendor-specific logic touches your data.

  1. 1Schema validation
  2. 2Field standardisation
  3. 3Data type enforcement
  4. 4Value normalisation
  5. 5Data cleansing
  6. 6PII classification
  7. 7Global enrichments (vendor IDs, GeoIP, UA, sessions)
  8. 8Consent enforcement
  9. 9Event fan-out
  10. 10Field removal
  11. 11Audit trail
2

Transformation Engine

Per-vendor logic that shapes each event for its specific destination. Runs after governance has been applied.

  1. 1Pipeline global transformations
  2. 2Pipeline enrichments
  3. 3Per-integration field mapping
  4. 4Per-integration enrichments
  5. 5Per-integration PII handling (SHA-256, masking)
  6. 6Per-integration custom logic (expressions or sandboxed code)
  7. 7Output validation

Write PII rules once

Apply them to every vendor automatically — no per-integration configuration required.

Enforce consent everywhere

A single consent decision blocks delivery to all relevant destinations simultaneously.

Validate before it leaves

Malformed events are caught at the governance layer — never discovered downstream after delivery fails.

Pipeline as code

Every transformation is a YAML/JSON configuration file stored in version control. No black boxes. No proprietary drag-and-drop editors that hide what your data is doing.

Define event triggers, field mappings, enrichments, PII handling rules, and output validation in a format your engineering team already understands. Review changes in pull requests. Roll back with a commit revert.

  • Version-controlled transformation configs
  • Declarative field mapping and enrichment
  • Per-integration PII handling rules
  • Output format validation before delivery
  • Review and approve changes via pull requests
ga4-pageview.yamlYAML
# ga4-pageview.yaml — Map page events to GA4 Measurement Protocol
name: ga4_pageview
integration: google-analytics-4
trigger:
  event: page
  conditions:
    - field: context.page.url
      operator: exists

transform:
  # Map standard fields to GA4 parameters
  mapping:
    - source: properties.title
      target: page_title
    - source: context.page.url
      target: page_location
    - source: context.page.referrer
      target: page_referrer
    - source: context.locale
      target: language

  # Enrich with session data
  enrichments:
    - type: session
      fields: [session_id, session_number]
    - type: user_property
      source: traits.plan
      target: user_properties.account_type

  # PII handling — hash user ID before sending
  pii:
    - field: user_id
      action: sha256

output:
  format: ga4_measurement_protocol
  endpoint: https://www.google-analytics.com/mp/collect
  validate: true

Your infrastructure. Your cloud. Your rules.

Every deployment is single-tenant and isolated. Deploy on any cloud, in your own VPC, or let us manage it. Docker Compose for development, Kubernetes and Helm charts for production. Your data never shares infrastructure with anyone else.

Recommended

Customer-Hosted (VPC)

Deploy in your own VPC on GCP, AWS, or Azure using Helm charts. Full operational control. Datafly has zero access to your infrastructure or data. This is the recommended deployment model for enterprise customers.

Hybrid

Management plane hosted by Datafly, data plane in your VPC. Your event data stays in your cloud account while we handle the control plane.

Datafly-Hosted

Managed Kubernetes with namespace isolation per customer. We handle infrastructure, scaling, and operations. You control configuration and data.

Runs on any infrastructure

Docker Compose for local development. Kubernetes with Helm charts for production. Deploy on any cloud or on-premise.

GCPAWSAzureKubernetesHelmDocker
No add-ons

Everything included. No per-connector fees.

Event-based pricing. Every integration included. No per-connector surcharges, no feature gates on core capabilities, no hidden fees.

All vendor integrations

GA4, Meta, TikTok, Google Ads, Snapchat, LinkedIn, Pinterest, Microsoft Ads, BigQuery, Snowflake, Redshift, Databricks, and 60+ more

Complete data governance

Every event is validated, standardised, and enriched before delivery

Version-controlled configurations

Review and approve all changes through your normal workflow

Real-time event inspector

See every event flowing through the platform in real time

400-day attribution

Track the full customer journey, not just the last 7 days

Automatic vendor identity

Vendor identities maintained automatically — no extra scripts needed

Cross-domain identity

Recognise customers seamlessly across all your domains and subdomains

Consent enforcement

Double-checked at collection and delivery time — no data leaks

Guaranteed event delivery

Every event is delivered reliably with automatic retry and zero data loss

Ready to take control of your first-party data?

See how Datafly Signal captures complete data, commands how it flows, and connects it to every destination — all from your own infrastructure.