Platform Architecture
The Datafly Signal platform
Eight independently-deployable components that capture, govern, and deliver first-party data from browser to vendor API, server-side, inside your own VPC. Scales from a single VPS up to 5M events per second per deployment. Capture. Command. Connect.
Everything you need. Nothing you don't.
Every component is purpose-built for first-party data, from collection to delivery. Scale each independently. Update without downtime. Deploy wherever your data must live.
5.2KB Collector
The lightest collector in the industry. Replaces every vendor script on your page. Ad blocker-proof, first-party, zero-dependency.
First-Party Collection
Your subdomain. Your cookies. Server-set 400-day identifiers that Safari's ITP cannot touch (57x longer than browser defaults).
Governance + Transform
Company-wide data quality rules run before per-vendor shaping. Every event validated, enriched, and PII-protected, automatically.
Guaranteed Delivery
Up to 5 delivery attempts with exponential backoff over 24 hours, then routed to a Dead Letter Queue for inspection and replay. No event is ever silently dropped.
Identity Resolution
30+ vendor IDs stored persistently and enriched on every event. Cross-domain token handoff that works in every browser, including Safari.
Full Management API
Every configuration, integration, and delivery rule is API-first. Granular permissions, complete audit trail, enterprise SSO.
Real-Time Visibility
Live event stream from collection to delivery. Inspect every stage: input, governance output, vendor payload, vendor response.
Enterprise Data Layer
High-throughput event streaming for millions of events per day. Persistent identity and configuration storage with automatic failover.
Built for enterprise throughput
Datafly Signal is an event streaming platform, not an HTTP proxy. Each deployment scales from a single VPS up to 5M events per second by adding nodes to the Kafka, processing, and delivery tiers. No event is ever dropped.
Up to 5M
Events per second
Per deployment. Each component scales horizontally; ingestion, processing, and delivery scale independently.
<50ms
End-to-end delivery
From browser event to vendor API response (p50), measured across all active integrations simultaneously.
400 days
Identity persistence
30+ vendor IDs stored persistently and re-enriched on every event. No lookups lost to cache eviction.
100%
Data completeness
No sampling. No ad blocker gaps. No attribution lost to browser restrictions. Every event arrives complete.
Guaranteed delivery: 5 retries, then Dead Letter Queue. Zero events lost.
Each event gets up to 5 delivery attempts with exponential backoff over 24 hours. Anything still failing routes to a per-vendor Dead Letter Queue, where it can be inspected, fixed, and replayed. Events are never silently discarded.
Two layers. Total control.
No other platform separates company-wide governance from per-vendor transformation. This distinction is what lets you enforce data quality once and deliver perfectly formatted data to every destination, without writing per-integration rules for every compliance requirement.
Data Governance Layer
Global governance that applies to every event, regardless of destination. Runs before any vendor-specific logic touches your data.
- 1Schema validation
- 2Field standardisation
- 3Data type enforcement
- 4Value normalisation
- 5Data cleansing
- 6PII classification
- 7Global enrichments (vendor IDs, GeoIP, UA, sessions)
- 8Consent enforcement
- 9Event fan-out
- 10Field removal
- 11Audit trail
Transformation Engine
Per-vendor logic that shapes each event for its specific destination. Runs after governance has been applied.
- 1Pipeline global transformations
- 2Pipeline enrichments
- 3Per-integration field mapping
- 4Per-integration enrichments
- 5Per-integration PII handling (SHA-256, masking)
- 6Per-integration custom logic (expressions or sandboxed code)
- 7Output validation
Write PII rules once
Apply them to every vendor automatically. No per-integration configuration required.
Enforce consent everywhere
A single consent decision blocks delivery to all relevant destinations simultaneously.
Validate before it leaves
Malformed events are caught at the governance layer, never discovered downstream after delivery fails.
Pipeline as code
Every transformation is a YAML/JSON configuration file stored in version control. No black boxes. No proprietary drag-and-drop editors that hide what your data is doing.
Define event triggers, field mappings, enrichments, PII handling rules, and output validation in a format your engineering team already understands. Review changes in pull requests. Roll back with a commit revert.
- Version-controlled transformation configs
- Declarative field mapping and enrichment
- Per-integration PII handling rules
- Output format validation before delivery
- Review and approve changes via pull requests
Your infrastructure. Your cloud. Your rules.
Every deployment is single-tenant and isolated. Deploy on any cloud, in your own VPC, or let us manage it. Docker Compose for single-node and development setups; Kubernetes (installed via Helm charts) for production scale. Your data never shares infrastructure with anyone else.
Customer-Hosted (VPC)
Deploy in your own VPC on GCP, AWS, or Azure. Production deployments run on Kubernetes (installed via Helm charts); single-node deployments run on Docker Compose on a virtual server. Full operational control. Datafly has zero access to your infrastructure or data. Recommended for enterprise customers.
Hybrid
Management plane hosted by Datafly, data plane in your VPC. Your event data stays in your cloud account while we handle the control plane.
Datafly-Hosted
Managed Kubernetes with namespace isolation per customer. We handle infrastructure, scaling, and operations. You control configuration and data.
Runs on any infrastructure
Docker Compose for single-node and development setups. Kubernetes (installed via Helm charts) for production. Deploy on any cloud or on-premise.
Everything included. No per-connector fees.
Event-based pricing. Every integration included. No per-connector surcharges, no feature gates on core capabilities, no hidden fees.
All vendor integrations
GA4, Meta, TikTok, Google Ads, Snapchat, LinkedIn, Pinterest, Microsoft Ads, BigQuery, Snowflake, Redshift, Databricks, and 60+ more
Complete data governance
Every event is validated, standardised, and enriched before delivery
Version-controlled configurations
Review and approve all changes through your normal workflow
Real-time event inspector
See every event flowing through the platform in real time
400-day attribution
Track the full customer journey, not just the last 7 days
Automatic vendor identity
Vendor identities maintained automatically. No extra scripts needed
Cross-domain identity
Recognise customers seamlessly across all your domains and subdomains
Consent enforcement
Double-checked at collection and delivery time. No data leaks.
Guaranteed event delivery
Every event is delivered reliably with automatic retry and zero data loss
Ready to take control of your first-party data?
See how Datafly Signal captures complete data, commands how it flows, and connects it to every destination, all from your own infrastructure.