Enriched event data in your warehouse in real time
Stream complete, governed, schema-validated event data to BigQuery, Snowflake, Redshift, Databricks, and object stores. The same enriched events that reach your marketing vendors also feed your data warehouse — no separate ETL pipeline required.
<50ms
Streaming delivery
Events arrive in near real time, not batch windows
100%
Event completeness
Every event — no sampling, no loss
0
ETL pipelines
No separate ingestion, transformation, or loading
30+
Enrichment fields
Geo, device, session, identity — pre-enriched
The problem with current data warehouse pipelines
Most analytics data reaches your warehouse through a chain of lossy, delayed steps: client-side tags capture a subset of events, a CDP or ETL tool batches them, and hours later a partial picture arrives in your tables.
Incomplete data in, incomplete insights out
If 20-40% of visitors are invisible to client-side tags because of ad blockers, your warehouse tables are missing the same 20-40%. Every dashboard, ML model, and report built on that data inherits the same blind spot.
Batch delays make data stale
Traditional ETL runs on hourly or daily schedules. By the time events land in your warehouse, the moment has passed. Real-time personalisation, fraud detection, and operational dashboards need data now — not data from six hours ago.
Duplicated pipelines, duplicated cost
One pipeline feeds your marketing vendors. A second feeds your warehouse. A third feeds your ML feature store. Each has its own ingestion, transformation, and delivery logic — with different schemas, different enrichments, and different data quality.
Supported destinations
Deliver enriched events to any combination of data warehouses, lakes, and object stores simultaneously.
| Destination | Delivery Method | Latency |
|---|---|---|
| Google BigQuery | Streaming Insert API | Real-time |
| Snowflake | Snowpipe | Near real-time |
| Amazon Redshift | Data API / COPY | Near real-time |
| Databricks | Delta Lake / REST API | Near real-time |
| ClickHouse | HTTP Interface | Real-time |
| Amazon S3 | PUT Object (Parquet/JSON) | Micro-batch |
| Google Cloud Storage | JSON / Parquet files | Micro-batch |
| Azure Blob Storage | Block Blob upload | Micro-batch |
| Amazon Kinesis | PutRecords | Real-time |
| Apache Kafka | Produce | Real-time |
Pipeline configuration
The same pipeline that delivers events to GA4 and Meta also streams them to your warehouse. Configure what fields to include, how to transform them, and which events to send.
# bigquery-all-events.yaml
name: bigquery_all_events
integration: google-bigquery
trigger:
event: "*" # All event types
parameters:
project_id: "your-gcp-project"
dataset: "datafly_events"
table: "events"
global:
anonymous_id:
source: anonymous_id
mode: direct
user_id:
source: user_id
mode: direct
session_id:
source: context.session.id
mode: direct
events:
"*":
mappings:
event_name:
source: event
mode: direct
event_type:
source: type
mode: direct
event_timestamp:
source: timestamp
mode: direct
page_url:
source: context.page.url
mode: direct
page_title:
source: context.page.title
mode: direct
referrer:
source: context.page.referrer
mode: direct
properties:
source: properties
mode: direct
geo_country:
source: context.geo.country
mode: direct
geo_city:
source: context.geo.city
mode: direct
device_type:
source: context.device.type
mode: direct
browser:
source: context.device.browser
mode: directData warehouse delivery that works
Single source of truth
The same enriched events reach your warehouse and your marketing vendors. One pipeline, one schema, one version of the data.
Real-time streaming
Events stream to your warehouse as they happen. No batch windows, no hourly ETL jobs, no stale data.
Pre-governed data
PII handling, consent enforcement, and bot filtering are applied before events reach your warehouse. Clean data by default.
Schema-validated
Every event is validated against your defined schema before delivery. No malformed rows, no type mismatches, no missing fields.
Guaranteed delivery
At-least-once delivery with exponential backoff retry. Failed events enter the dead letter queue for inspection and replay.
Pre-enriched
Events arrive with geolocation, device parsing, session data, and 30+ vendor IDs already attached. No post-load enrichment needed.
What teams build with warehouse data
BI and reporting
Build dashboards in Looker, Tableau, or Power BI on complete, real-time event data. No gaps from ad blockers, no sampling, no batch delays.
ML and AI models
Train recommendation engines, demand forecasters, and personalisation models on unsampled behavioural data with full user journeys and complete identity.
Attribution modelling
Run custom multi-touch attribution models on raw event data with 400-day identity resolution. See the full path to conversion, not just the last 7 days.
Customer 360
Combine web and app behavioural data with CRM and transaction data in your warehouse. Complete event history with consistent identity across all touchpoints.
Real-time activation
Stream events to Kinesis or Kafka for real-time feature stores, fraud detection, personalisation engines, and operational monitoring.
Complete data in your warehouse, in real time
See how Datafly Signal streams enriched, governed event data to your data warehouse alongside every marketing destination.