Invalid Traffic
Stop bots before they hit Meta CAPI, Google Ads, and your warehouse
Up to half of programmatic ad traffic is invalid. When bot conversions reach Meta CAPI or Google Ads server-side, they corrupt Smart Bidding, inflate ROAS, and waste delivery quota. Datafly Signal filters bots at the Signal Core, before any vendor delivery, with rules you control per pipeline.
20-50%
Of programmatic traffic is bots
Industry estimates from HUMAN Security and Pixalate IVT reports
$0
CAPI quota spent on bots
Bot events stripped before vendor delivery, not after
6 layers
Of detection
Signature, ASN, headless, rate-limit, honeypot, behavioural
Per pipeline
Configurable rules
Web, mobile, and server pipelines tuned independently
Why client-side bot detection isn't enough
Most bot filtering happens after the fact, in vendor dashboards or analytics tools, after the bot conversion has already polluted Smart Bidding signals and burned your delivery budget. By the time you spot it, the damage is done.
Vendor APIs don't reject bots well
Meta CAPI, Google Ads, TikTok Events API, and LinkedIn CAPI accept whatever events you send them. They have their own IVT detection downstream, but a bot conversion arriving with a valid event ID and matching hashed PII will be ingested. Smart Bidding learns from it. Lookalike audiences use it. The pollution happens upstream of any vendor-side filtering.
gtag.js and Meta Pixel run inside the bot
If a headless Chromium scraper visits your checkout page, it executes gtag.js, fires the purchase event, and the conversion is recorded. Client-side tags can't reliably distinguish bots because they run in the same JavaScript context the bot is driving. The browser fingerprint looks legitimate because it is a real browser, just one being remote-controlled.
Bot CAPI events still cost you quota
Vendor APIs are rate-limited and metered. Every bot event pushed to Meta CAPI counts against your quota whether the conversion is later flagged invalid or not. For high-traffic pipelines this means burning real budget on traffic you should never have delivered, and risking legitimate events being throttled at peak.
How Datafly Signal filters invalid traffic
Capture every event server-side
Datafly.js sends every event to your own first-party subdomain. The Ingestion Gateway captures the raw HTTP request, including headers, IP, user agent, TLS fingerprint, and timing data that a client-side tag wouldn't have access to.
Run the bot filter at the Signal Core
Before any pipeline transformation, the Signal Core evaluates the event against the bot filter ruleset. Six layers of detection run in parallel: signature match, ASN/IP reputation, headless browser fingerprinting, rate-limit anomalies, honeypot tripwires, and behavioural patterns.
Drop, quarantine, or tag
Each rule has a configurable action. Drop removes the event entirely. Quarantine routes it to a separate warehouse table for analysis without delivering to vendors. Tag passes the event through with an is_bot flag for downstream filtering.
Real-time metrics and audit
Every filtered event is logged with its match reason. The management UI shows the bot-filtered rate per pipeline in real time, alongside delivered and consent-filtered counts. Drill into any specific event to see which rule matched.
Updated weekly without a deploy
The bot signature database, ASN blocklist, and headless browser detection rules update server-side without requiring a redeploy. New scraper signatures are picked up within hours of identification, across every customer environment.
Bot filter configuration
Rules are versioned alongside the rest of your pipeline configuration. Change them via the management UI or by committing to your blueprint repo. Every rule change is auditable.
# bot-filter.yaml
# Org-wide bot filtering rules applied at the Signal Core
# before any vendor delivery. Bot events are dropped
# (or routed to a quarantine warehouse for analysis).
bot_filter:
enabled: true
action: drop # or: quarantine, tag
rules:
# Known bot user-agents (curated list, updated weekly)
- type: user_agent_signature
mode: deny
# Datacenter / hosting provider IP ranges
- type: asn_blocklist
mode: deny
blocklist: datacenter
# Headless browser signals
- type: headless_browser
detect:
- webdriver_property
- missing_plugin_array
- puppeteer_signature
# Behavioural anomalies (server-side)
- type: rate_limit
threshold: 200
window: 60s
scope: ip
# Honeypot field tripwires
- type: honeypot_form_field
field_name: phone_alt
exceptions:
# Allow declared good bots (uptime checks, partners)
- user_agent_contains: "DataflyHealthCheck"
- ip_in_allowlist: ["203.0.113.45/32"]
reporting:
metrics_endpoint: /metrics/bots
audit_log: trueSix layers of detection, one config
No single signal catches every bot. Datafly Signal layers detection so each rule covers what the others miss.
User-agent signatures
Curated database of known scraper, crawler, and automation tool user-agents. Updated server-side on a weekly cadence without needing a redeploy.
ASN and datacenter blocklists
IP ranges from AWS, GCP, Azure, OVH, and other hosting providers flagged by default. Configurable allowlist for legitimate server-to-server integrations.
Headless browser fingerprinting
Detects Puppeteer, Playwright, Selenium, and headless Chromium signatures from server-observed JavaScript fingerprints sent with each event.
Rate-limit anomaly detection
Flags IPs and sessions exceeding configurable per-second thresholds. Catches volumetric scraping and stress-test traffic that signatures miss.
Honeypot tripwires
Hidden form fields that legitimate users never fill in. Bots filling them are flagged immediately without affecting real conversions.
Quarantine + observability
Route filtered events to a separate warehouse table for analysis instead of dropping them. Compare delivered vs filtered rates per pipeline in real time.
What gets filtered, by detection layer
Each detection layer targets a different category of invalid traffic. They run in parallel at the Signal Core, with the first match determining the outcome.
| Layer | What it catches | Default action |
|---|---|---|
| User-agent signature | Known scrapers, crawlers, headless tools | Drop |
| ASN / datacenter blocklist | Traffic from cloud / hosting IP ranges | Drop |
| Headless browser fingerprint | Puppeteer, Playwright, Selenium, headless Chromium | Drop |
| Rate-limit anomaly | Volumetric scraping, stress tests, attack traffic | Drop |
| Honeypot tripwire | Bots filling hidden form fields | Drop |
| Behavioural pattern | No mouse movement, instant form fill, no scroll | Tag (is_bot) |
Frequently asked questions
- What is invalid traffic and why does it matter for paid media?
- Invalid traffic (IVT) is bot, scraper, datacenter, and automation traffic that reaches your site without a real user behind it. Industry estimates from HUMAN Security and Pixalate put IVT at 20-50% of programmatic ad traffic. When bot conversions reach Meta CAPI, Google Ads, or TikTok server-side, they corrupt Smart Bidding, inflate ROAS calculations, and burn delivery quota that should be available for real customers.
- How is server-side bot filtering different from gtag.js or Meta Pixel?
- Client-side tags run inside whatever browser context loads them, including headless Chromium scrapers and Selenium-driven bots. They cannot reliably distinguish a bot from a real user because the JavaScript fingerprint is genuine, just remote-controlled. Datafly Signal filters at the Signal Core, after events arrive at your gateway, using server-observed signals (IP / ASN, TLS fingerprint, request timing, headers) that client-side tags do not have access to.
- Will filtering bots reduce my reported conversions?
- In dashboards yes, in true performance no. Reported conversion counts may drop because bot conversions stop being counted. The conversions that remain are real, which is what Smart Bidding and lookalike audiences should be optimising on. Most teams see CAPI match rates and ROAS calculations improve once bots are filtered upstream.
- Can I see which events were filtered and why?
- Yes. Every filtered event is logged with the rule that matched, the action taken (drop, quarantine, or tag), and the original event payload. The management UI shows bot-filtered rate per pipeline in real time and lets you drill into specific events for investigation. You can route filtered events to a quarantine warehouse table for offline analysis instead of dropping them entirely.
- Does Datafly Signal replace IVT detection at the vendor level?
- No, it complements it. Vendors like Meta and Google have their own downstream IVT detection. Datafly Signal filters upstream so the bot conversion never reaches the vendor in the first place, saving your CAPI quota and preventing the bot signal from polluting Smart Bidding before vendor-side IVT catches it. The two layers stack.
- Can I bring my own bot signature list?
- Yes. The default ruleset (curated user-agent signatures, ASN blocklists, headless browser fingerprints) is updated weekly server-side without requiring a redeploy. You can add custom rules per pipeline: specific IP allowlists, custom honeypot field names, additional user-agent patterns, custom rate-limit thresholds. Every rule is version-controlled alongside your blueprint.
Related
Consent Mode v2
Filter events on consent state, evaluated server-side per vendor.
PII Handling
Hash, mask, or strip personal data at the same Signal Core layer.
Attribution
Eliminate duplicate conversions and clean attribution signal end to end.
Meta Conversions API
Server-to-server delivery to Meta with pre-filtered events.
Stop paying for traffic that doesn't convert
Request a technical walkthrough. We'll show you the bot rate on a sample of your current event stream and what filtering it server-side would recover in CAPI quota and Smart Bidding signal quality.