Data Warehouses

Enriched event data in your warehouse in real time

Stream complete, governed, schema-validated event data to BigQuery, Snowflake, Redshift, Databricks, and object stores. The same enriched events that reach your marketing vendors also feed your data warehouse — no separate ETL pipeline required.

<50ms

Streaming delivery

Events arrive in near real time, not batch windows

100%

Event completeness

Every event — no sampling, no loss

0

ETL pipelines

No separate ingestion, transformation, or loading

30+

Enrichment fields

Geo, device, session, identity — pre-enriched

The problem with current data warehouse pipelines

Most analytics data reaches your warehouse through a chain of lossy, delayed steps: client-side tags capture a subset of events, a CDP or ETL tool batches them, and hours later a partial picture arrives in your tables.

Incomplete data in, incomplete insights out

If 20-40% of visitors are invisible to client-side tags because of ad blockers, your warehouse tables are missing the same 20-40%. Every dashboard, ML model, and report built on that data inherits the same blind spot.

Batch delays make data stale

Traditional ETL runs on hourly or daily schedules. By the time events land in your warehouse, the moment has passed. Real-time personalisation, fraud detection, and operational dashboards need data now — not data from six hours ago.

Duplicated pipelines, duplicated cost

One pipeline feeds your marketing vendors. A second feeds your warehouse. A third feeds your ML feature store. Each has its own ingestion, transformation, and delivery logic — with different schemas, different enrichments, and different data quality.

Supported destinations

Deliver enriched events to any combination of data warehouses, lakes, and object stores simultaneously.

DestinationDelivery MethodLatency
Google BigQueryStreaming Insert APIReal-time
SnowflakeSnowpipeNear real-time
Amazon RedshiftData API / COPYNear real-time
DatabricksDelta Lake / REST APINear real-time
ClickHouseHTTP InterfaceReal-time
Amazon S3PUT Object (Parquet/JSON)Micro-batch
Google Cloud StorageJSON / Parquet filesMicro-batch
Azure Blob StorageBlock Blob uploadMicro-batch
Amazon KinesisPutRecordsReal-time
Apache KafkaProduceReal-time

Pipeline configuration

The same pipeline that delivers events to GA4 and Meta also streams them to your warehouse. Configure what fields to include, how to transform them, and which events to send.

bigquery-all-events.yamlYAML
# bigquery-all-events.yaml
name: bigquery_all_events
integration: google-bigquery
trigger:
  event: "*"  # All event types

parameters:
  project_id: "your-gcp-project"
  dataset: "datafly_events"
  table: "events"

global:
  anonymous_id:
    source: anonymous_id
    mode: direct
  user_id:
    source: user_id
    mode: direct
  session_id:
    source: context.session.id
    mode: direct

events:
  "*":
    mappings:
      event_name:
        source: event
        mode: direct
      event_type:
        source: type
        mode: direct
      event_timestamp:
        source: timestamp
        mode: direct
      page_url:
        source: context.page.url
        mode: direct
      page_title:
        source: context.page.title
        mode: direct
      referrer:
        source: context.page.referrer
        mode: direct
      properties:
        source: properties
        mode: direct
      geo_country:
        source: context.geo.country
        mode: direct
      geo_city:
        source: context.geo.city
        mode: direct
      device_type:
        source: context.device.type
        mode: direct
      browser:
        source: context.device.browser
        mode: direct

Data warehouse delivery that works

Single source of truth

The same enriched events reach your warehouse and your marketing vendors. One pipeline, one schema, one version of the data.

Real-time streaming

Events stream to your warehouse as they happen. No batch windows, no hourly ETL jobs, no stale data.

Pre-governed data

PII handling, consent enforcement, and bot filtering are applied before events reach your warehouse. Clean data by default.

Schema-validated

Every event is validated against your defined schema before delivery. No malformed rows, no type mismatches, no missing fields.

Guaranteed delivery

At-least-once delivery with exponential backoff retry. Failed events enter the dead letter queue for inspection and replay.

Pre-enriched

Events arrive with geolocation, device parsing, session data, and 30+ vendor IDs already attached. No post-load enrichment needed.

What teams build with warehouse data

BI and reporting

Build dashboards in Looker, Tableau, or Power BI on complete, real-time event data. No gaps from ad blockers, no sampling, no batch delays.

ML and AI models

Train recommendation engines, demand forecasters, and personalisation models on unsampled behavioural data with full user journeys and complete identity.

Attribution modelling

Run custom multi-touch attribution models on raw event data with 400-day identity resolution. See the full path to conversion, not just the last 7 days.

Customer 360

Combine web and app behavioural data with CRM and transaction data in your warehouse. Complete event history with consistent identity across all touchpoints.

Real-time activation

Stream events to Kinesis or Kafka for real-time feature stores, fraud detection, personalisation engines, and operational monitoring.

Complete data in your warehouse, in real time

See how Datafly Signal streams enriched, governed event data to your data warehouse alongside every marketing destination.