Invalid Traffic

Stop bots before they hit Meta CAPI, Google Ads, and your warehouse

Up to half of programmatic ad traffic is invalid. When bot conversions reach Meta CAPI or Google Ads server-side, they corrupt Smart Bidding, inflate ROAS, and waste delivery quota. Datafly Signal filters bots at the Signal Core, before any vendor delivery, with rules you control per pipeline.

20-50%

Of programmatic traffic is bots

Industry estimates from HUMAN Security and Pixalate IVT reports

$0

CAPI quota spent on bots

Bot events stripped before vendor delivery, not after

6 layers

Of detection

Signature, ASN, headless, rate-limit, honeypot, behavioural

Per pipeline

Configurable rules

Web, mobile, and server pipelines tuned independently

Why client-side bot detection isn't enough

Most bot filtering happens after the fact, in vendor dashboards or analytics tools, after the bot conversion has already polluted Smart Bidding signals and burned your delivery budget. By the time you spot it, the damage is done.

Vendor APIs don't reject bots well

Meta CAPI, Google Ads, TikTok Events API, and LinkedIn CAPI accept whatever events you send them. They have their own IVT detection downstream, but a bot conversion arriving with a valid event ID and matching hashed PII will be ingested. Smart Bidding learns from it. Lookalike audiences use it. The pollution happens upstream of any vendor-side filtering.

gtag.js and Meta Pixel run inside the bot

If a headless Chromium scraper visits your checkout page, it executes gtag.js, fires the purchase event, and the conversion is recorded. Client-side tags can't reliably distinguish bots because they run in the same JavaScript context the bot is driving. The browser fingerprint looks legitimate because it is a real browser, just one being remote-controlled.

Bot CAPI events still cost you quota

Vendor APIs are rate-limited and metered. Every bot event pushed to Meta CAPI counts against your quota whether the conversion is later flagged invalid or not. For high-traffic pipelines this means burning real budget on traffic you should never have delivered, and risking legitimate events being throttled at peak.

How Datafly Signal filters invalid traffic

1

Capture every event server-side

Datafly.js sends every event to your own first-party subdomain. The Ingestion Gateway captures the raw HTTP request, including headers, IP, user agent, TLS fingerprint, and timing data that a client-side tag wouldn't have access to.

2

Run the bot filter at the Signal Core

Before any pipeline transformation, the Signal Core evaluates the event against the bot filter ruleset. Six layers of detection run in parallel: signature match, ASN/IP reputation, headless browser fingerprinting, rate-limit anomalies, honeypot tripwires, and behavioural patterns.

3

Drop, quarantine, or tag

Each rule has a configurable action. Drop removes the event entirely. Quarantine routes it to a separate warehouse table for analysis without delivering to vendors. Tag passes the event through with an is_bot flag for downstream filtering.

4

Real-time metrics and audit

Every filtered event is logged with its match reason. The management UI shows the bot-filtered rate per pipeline in real time, alongside delivered and consent-filtered counts. Drill into any specific event to see which rule matched.

5

Updated weekly without a deploy

The bot signature database, ASN blocklist, and headless browser detection rules update server-side without requiring a redeploy. New scraper signatures are picked up within hours of identification, across every customer environment.

Bot filter configuration

Rules are versioned alongside the rest of your pipeline configuration. Change them via the management UI or by committing to your blueprint repo. Every rule change is auditable.

bot-filter.yamlYAML
# bot-filter.yaml
# Org-wide bot filtering rules applied at the Signal Core
# before any vendor delivery. Bot events are dropped
# (or routed to a quarantine warehouse for analysis).

bot_filter:
  enabled: true
  action: drop  # or: quarantine, tag

  rules:
    # Known bot user-agents (curated list, updated weekly)
    - type: user_agent_signature
      mode: deny

    # Datacenter / hosting provider IP ranges
    - type: asn_blocklist
      mode: deny
      blocklist: datacenter

    # Headless browser signals
    - type: headless_browser
      detect:
        - webdriver_property
        - missing_plugin_array
        - puppeteer_signature

    # Behavioural anomalies (server-side)
    - type: rate_limit
      threshold: 200
      window: 60s
      scope: ip

    # Honeypot field tripwires
    - type: honeypot_form_field
      field_name: phone_alt

  exceptions:
    # Allow declared good bots (uptime checks, partners)
    - user_agent_contains: "DataflyHealthCheck"
    - ip_in_allowlist: ["203.0.113.45/32"]

  reporting:
    metrics_endpoint: /metrics/bots
    audit_log: true

Six layers of detection, one config

No single signal catches every bot. Datafly Signal layers detection so each rule covers what the others miss.

User-agent signatures

Curated database of known scraper, crawler, and automation tool user-agents. Updated server-side on a weekly cadence without needing a redeploy.

ASN and datacenter blocklists

IP ranges from AWS, GCP, Azure, OVH, and other hosting providers flagged by default. Configurable allowlist for legitimate server-to-server integrations.

Headless browser fingerprinting

Detects Puppeteer, Playwright, Selenium, and headless Chromium signatures from server-observed JavaScript fingerprints sent with each event.

Rate-limit anomaly detection

Flags IPs and sessions exceeding configurable per-second thresholds. Catches volumetric scraping and stress-test traffic that signatures miss.

Honeypot tripwires

Hidden form fields that legitimate users never fill in. Bots filling them are flagged immediately without affecting real conversions.

Quarantine + observability

Route filtered events to a separate warehouse table for analysis instead of dropping them. Compare delivered vs filtered rates per pipeline in real time.

What gets filtered, by detection layer

Each detection layer targets a different category of invalid traffic. They run in parallel at the Signal Core, with the first match determining the outcome.

LayerWhat it catchesDefault action
User-agent signatureKnown scrapers, crawlers, headless toolsDrop
ASN / datacenter blocklistTraffic from cloud / hosting IP rangesDrop
Headless browser fingerprintPuppeteer, Playwright, Selenium, headless ChromiumDrop
Rate-limit anomalyVolumetric scraping, stress tests, attack trafficDrop
Honeypot tripwireBots filling hidden form fieldsDrop
Behavioural patternNo mouse movement, instant form fill, no scrollTag (is_bot)

Frequently asked questions

What is invalid traffic and why does it matter for paid media?
Invalid traffic (IVT) is bot, scraper, datacenter, and automation traffic that reaches your site without a real user behind it. Industry estimates from HUMAN Security and Pixalate put IVT at 20-50% of programmatic ad traffic. When bot conversions reach Meta CAPI, Google Ads, or TikTok server-side, they corrupt Smart Bidding, inflate ROAS calculations, and burn delivery quota that should be available for real customers.
How is server-side bot filtering different from gtag.js or Meta Pixel?
Client-side tags run inside whatever browser context loads them, including headless Chromium scrapers and Selenium-driven bots. They cannot reliably distinguish a bot from a real user because the JavaScript fingerprint is genuine, just remote-controlled. Datafly Signal filters at the Signal Core, after events arrive at your gateway, using server-observed signals (IP / ASN, TLS fingerprint, request timing, headers) that client-side tags do not have access to.
Will filtering bots reduce my reported conversions?
In dashboards yes, in true performance no. Reported conversion counts may drop because bot conversions stop being counted. The conversions that remain are real, which is what Smart Bidding and lookalike audiences should be optimising on. Most teams see CAPI match rates and ROAS calculations improve once bots are filtered upstream.
Can I see which events were filtered and why?
Yes. Every filtered event is logged with the rule that matched, the action taken (drop, quarantine, or tag), and the original event payload. The management UI shows bot-filtered rate per pipeline in real time and lets you drill into specific events for investigation. You can route filtered events to a quarantine warehouse table for offline analysis instead of dropping them entirely.
Does Datafly Signal replace IVT detection at the vendor level?
No, it complements it. Vendors like Meta and Google have their own downstream IVT detection. Datafly Signal filters upstream so the bot conversion never reaches the vendor in the first place, saving your CAPI quota and preventing the bot signal from polluting Smart Bidding before vendor-side IVT catches it. The two layers stack.
Can I bring my own bot signature list?
Yes. The default ruleset (curated user-agent signatures, ASN blocklists, headless browser fingerprints) is updated weekly server-side without requiring a redeploy. You can add custom rules per pipeline: specific IP allowlists, custom honeypot field names, additional user-agent patterns, custom rate-limit thresholds. Every rule is version-controlled alongside your blueprint.

Stop paying for traffic that doesn't convert

Request a technical walkthrough. We'll show you the bot rate on a sample of your current event stream and what filtering it server-side would recover in CAPI quota and Smart Bidding signal quality.